AD-A261  035 


ARI  Research  Note  93-09 


Hi  nil  nun  ill 


Improving  the  Selection,  Classification,  and 
Utilization  of  Army  Enlisted  Personnel 


Annual  Report,  1987  Fiscal  Year 
Supplement  to 
ARI  Technical  Report  862 

Human  Resources  Research  Organization 
American  institutes  for  Research 
Personnei  Decisions  Research  institute 
U.S.  Army  Research  institute 


Contracting  Officer’s  Representative 
Lawrence  M.  Hanser 


Selection  and  Classification  Technical  Area 
Michael  G.  Rumsey,  Chief 

Manpower  and  Personnel  Research  Division 
Zita  M.  Simutis,  Director 


93-02145 

lllllllll 


November  1992 


United  States  Army 

Research  Institute  for  the  Behavioral  and  Social  Sciences 


Approvsd  for  public  release;  distribution  is  unlimited. 

98  2  8  003 


DI CLAIEI  NOTICE 


TfflS  DOCUMENT  IS  BEST 
QUAUTY  AVAILABLE.  THE  COPY 
FURNISHED  TO  DTIC  CONTAINED 
A  SIGNmCANT  NUMBER  OF 
PAGES  WHICH  DO  NOT 
REPRODUCE  LEGIBLY. 


U.S.  ARMY  RESEARCH  INSTITUTE 

FOR  THE  BEHAVIORAL  AND  SOCIAL  SCIENCES 


A  Field  Operating  Agency  Under  the  Jurisdiction 
of  the  Deputy  Chief  of  Staff  for  Personnel 


EDGAR  M.  JOHNSON 
Acting  Director 

Research  accomplished  under  contract 
for  the  Department  of  the  Army 

Human  Resources  Research  Organization 

Technical  review  by 
Michael  G.  Rumsey 


NOTICES 

DISTRIBUTION:  This  report  has  been  cleared  for  release  to  the  Defense  Technical  Information 
Center  (DTIC)  to  comply  with  regulatory  requirements.  It  has  been  given  no  primary  distribution 
other  than  to  DTIC  and  will  be  available  only  through  DTIC  or  the  National  Technical 
Information  Service  (NTIS). 

FINAL  DISPOSITION:  This  report  may  be  destroyed  when  it  is  no  longer  needed.  Please  do  not 
return  it  to  the  U.S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences. 

NOTE:  The  views,  opinions,  and  findings  in  this  report  are  those  of  the  author<s)  and  should  not 
be  construed  as  an  official  Department  of  the  Army  position,  policy,  or  decision,  unless  so 
designated  by  other  authorized  documents. 


REPORT  DOCUMENTATION  PAGE 


1.  AGENCY  USE  ONLY  Heave  Oiank 


2.  REPORT  DATE 

1992,  November 


REPORT  TYPE  AND  OATES  COVERED 

Interim  Oct  86  -  Sep  87 


4  TITLE  AND  SUBTiTLE  Improving  the  Selection,  Classification, 
and  Utilization  of  Army  Enlisted  Personnel:  Annual  Report, 
1987  Fiscal  Year  Supplement  to  ARI  Technical  Report  862 


6.  AUTHOR(S)  Human  Resources  Research  Organization, 
American  Institutes  for  Research,  Personnel  Decisions 

Research  Institute,  U.S.  Army  Research  Institute 
for  the  Behavioral  and  Social  Sciences  _ 


7  PERFORMING  ORGANIZATION  NAME(S)  AND  AOORESS(ES) 

Human  Resources  Research  Organization 
66  Canal  Center  Plaza,  Suite  400 
Alexandria,  Virginia  22314-4499 


5.  FUNDING  NUMBERS 

MDA903-82-C-0531 

63007A 

792 

232 

C71 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 

HumRRO  Ir-PRD-88-23 


10.  SPONSORING '  MONITORING 
AGENCY  REPORT  NUMBER 


ARI  Research  Note  93-09 


9.  SPONSORING /MONITORING  AGENCY  NAME(S)  AND  AODRESS(ES) 

U.S.  Army  Research  Institute  for  the  Behavioral  and 
Social  Sciences 
Attn:  PERI-RS 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333-5600 


11.  SUPPLEMENTARY  NOTES  project  A:  Improving  the  Selection,  Classification,  and  Utiliza¬ 
tion  of  Army  Enlisted  Personnel  (Human  Resources  Research  Organization,  American 
Institutes  for  Research,  Personnel  Decisions  Research  Institute,  U.S,  Army  Research 


12a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 


12b.  DISTRIBUTION  CODE 


Approved  for  public  release; 
distribution  is  unlimited. 


13.  ABSTRACT  (Maximum  200  wordi) 

The  materials  presented  in  this  report  were  prepared  under  Project  A,  the  U.S. 
Army's  large-scale  manpower  and  personnel  effort  for  Improving  the  selection,  clas¬ 
sification,  and  utilization  of  Army  enlisted  personnel.  This  Research  Note  supple¬ 
ments  the  U.S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences' s 
Technical  Report  862,  the  project  annual  report  for  the  1987  fiscal  year.  It  aug¬ 
ments  that  report  by  providing  copies  of  a  set  of  technical  papers  prepared  during 
the  year  to  report  on  detailed  phases  of  the  project  research  methods  and  results. 


14.  SUBJECT  TERMS 
Army-wide  measures 
Hands-on  tests 
Project  A  ratings 


17.  SECURITY  CLASSIFICATION 
OF  REPORT 

Unclassified 


NSN  7540-01-280-5500 


Classification 
Knowledge  tests 
Selection 


Criterion  measures 
Predictor  measures 
Soldier  effectiveness 


18.  SECURITY  CLASSIFICATION 
OF  THIS  PAGE 

Unclassified 


19.  SECURITY  CLASSIFICATION 
OF  abstract 

Unclassified 


15.  NUMBER  OF  PAGES 

610 


1«.  PRICE  CODE 


20.  LIMITATION  OF  ABSTRACT 


Unlimited 


Standard  Form  298  (Rev  2-89) 

PrctcriMd  by  «NS<  Md 


EDITORS'  PREFACE 


In  the  course  of  executing  the  research  program  of  Project  A,  it  has 
always  been  an  accepted— indeed  priority— practice  to  find  mechanisms  and 
means  for  communicating  and  sharing  early  or  otherwise  salient  research 
results  and  activities  with  the  U.S.  Army  and  with  the  professional  research 
community  at  large.  As  a  result,  numerous  papers,  reports,  and  symposium 
proceedings  have  been  produced  each  year  to  meet  the  continuing  interest  of 
both  scientific  and  operational  audiences.  The  custom  within  Project  A  has 
been  to  compile  these  documents  and  to  publish  them  as  an  adjunct  to  the 
Project  A  Annual  Report. 

The  papers  in  this  supplement  to  the  fiscal  year  1987  annual  report  are 
grouped  according  to  presentation  at  four  professional  meetings  during  the 
year.  Many  of  the  papers  are  referenced  in  the  annual  report.  That  some  are 
not  should  in  no  way  diminish  their  importance  or  relevance  to  the  readers  of 
these  reports.  Each  document  was  produced  to  meet  a  specific  need  and  audi¬ 
ence  and,  when  taken  in  context,  provides  in  effect  a  chronology  of  reports 
and  communications  that  reveal  the  process  and  flow  of  the  overall  research 
program  being  accomplished  collegially  by  the  U.S.  Army  Research  Institute  for 
the  Behavioral  and  Social  Sciences  and  contractor  scientists.  In  many  cases 
these  findings  have  been  further  refined  or  synthesized  into  more  formal 
contract-deliverable  items. 


Lawrence  M.  Hanser 
Lola  M.  Zook 


\  AoeMSloa  For  I 

nis  ORAAI 

0r 

DTIC  TAB 

□ 

Unaiinovuioed 

□ 

JuatLflcatlon — 

By - - - - 

Plat  rlbut  1  ojn/ _ _ _ 

Availability  CodeB 
I  jAvail  anid/br 
Spec lal 


mist 


4 


IMPROVING  THE  SELECTION,  CLASSIFICATION,  AND  UTILIZATION 
OF  ARMY  ENLISTED  PERSONNEL 

ANNUAL  REPORT,  1987  FISCAL  YEAR  SUPPLEMENT  TO  ARI  TECHNICAL  REPORT  862 


CONTENTS _ 

Page 

Purpose  of  the  Report  .  1 

Overview  of  Project  A  .  1 

Papers  Presented  at  the  Annual  Conference  of  the  Military  Testing 

Association,  Mystic,  Connecticut,  November  1986 

Arabian,  J.  M.,  &  Mason,  J,  K.  Relationship  of  SOT  Scores  to 

Project  A  Measures  .  5 

Campbell,  C.  H.,  &  Rumsey,  M.  G.  Skill  Requirement  Influences  on 

Measurement  Method  Intercorrelations  .  13 

Campbell,  J.  P.,  Hanser,  L.  M.,  &  Wise,  L.  The  Development  of  a 

Model  of  the  Project  A  Criterion  Space  .  21 

Campbell,  R.  C.,  Campbell,  C.  H.,  &  Doyle,  E.  L.  Patterns  in  Skill 
Level  One  Performance  in  Representative  Army  Jobs;  Common  and 
Technical  Task  Comparisons  .  33 

Ford,  P.,  &  Hoffman,  R.  G.  Effects  of  Test  Programs  on  Task 

Proficiency .  39 

Gast,  I.  F.,  &  White,  L.  A.  Effects  of  Soldier  Performance  and 

Characteristics  on  Relationships  with  Superiors  .  45 

Harris,  J.  H.,  Campbell,  J.  P.,  &  Campbell,  C.  H.  The  Project  A 

Concurrent  Validation  Data  Collection  .  53 

Hoffman,  R.  G.  Post  Differences  in  Hands-on  Task  Tests  .  61 

Hoffman,  R.  G.,  &  Ford,  P.  Estimates  of  Task  Parameters  for  Test 

and  Training  Development  .  67 

McHenry,  J.  J.,  Harris,  J.  H.,  &  Oppler,  S.  M.  Using  Confirmatory 

Factor  Analysis  To  Aid  in  Assessing  Task  Performance  .  75 

Olson,  D.  M.,  &  Borman,  W.  C.  Influence  of  Environment.  Ability. 

and  Temperament  on  Performance  in  Army  MOS  .  83 


V 


CONTENTS  (Continued) 


Page 


Peterson,  N.,  Hough,  L.,  Ashworth,  S.,  &  Toquam,  J.  New  Predictors 

of  Soldier  Performance  .  91 

Radtke,  P.,  &  Edwards,  D.  S.  Effect  of  Practice  on  Soldier  Task 

Performance .  99 

Smith,  E.  P.,  &  Rossmeissi,  P.  G.  Some  Conditions  Affecting 

Assessment  of  Job  Requirements  .  107 

Smith,  E.  P.,  &  Walker,  C.B.  Short  Versus  Long  Term  Tenure  as  a 

Criterion  for  Validating  Biodata  .  115 

Wise,  L.  L.,  McHenry,  J.  J.,  Rossmeissi,  P.  G.,  &  Oppler,  S.H. 

ASVAB  Validities  Using  Improved  Job  Performance  Measures  ....  123 


Papers  Presented  at  a  Data  Analysis  Workshop  of  the  Comlttee 
on  Performance  of  Military  Personnel,  Baltimore,  December  1986 

Campbell,  J.  P.  Validation  Analysis  for  New  Predictors  .  131 

McHenry,  J.  J.,  Wise,  L.  L.,  Campbell,  J.  P.,  &  Hanser,  L.  M. 

A  Latent  Structure  Model  of  Job  Performance  Factors;  Appendix  .  167 

Wise,  L.  L.,  McHenry,  J.  J.,  &  Young,  W.  Y.  Project  A  Concurrent 

Validation:  Treatment  of  Missing  Data  .  203 


Papers  Presented  at  the  Annual  Conference  of  the  Society 
for  Industrial  and  Organizational  Psychology,  Atlanta, 

April  1987 

Campbell,  C.  H.,  Borman,  W.  C.,  Felker,  D.  C.,  Ford,  P.,  Park,  M.  V., 
Pulakos,  E.  C.,  Riegelhaupt,  B.  J.,  &  Rumsey,  M.  G.  Development 
of  Pro.iect  A  Job  Performance  Measures . 

Campbell,  J.  P.,  McHenry,  J.  J.,  and  Wise,  L.  L.  Analysis  of 

Criterion  Measures:  The  Modeling  of  Performance  . 

Hough,  L.  M.,  &  Ashworth.  A.  D.  Predicting  Soldier  Performance: 
Assessment  of  Temperament  Constructs  as  Predictors  of  Job 
Performance  . 


McHenry,  J.  J.,  Hough,  L.  M.,  Toquam,  J.  L.,  Hanson,  M.  A.,  & 
Ashworth,  S.  Pro.iect  A  Validity  Results;  The  Relationship 
Between  Predictor  and  Criterion  Domains  . 


229 

239 


285 


313 


vi 


CONTENTS  (Continued) 


Page 


Peterson.  N.  G.,  Hough,  L.  M..  Ounnette,  M.  D.,  Rosse,  R.  L., 

Houston,  J.  S.,  Toquatn,  J.  L.,  &  Wing,  H.  Identification 
of  Predictor  Constructs  and  Development  of  New  Selection/ 

Classification  Tests  .  343 

Pulakos,  E.  D.,  White,  L.  A.,  &  Borman,  W.  C.  An  Examination 

of  Race  and  Sex  Effects  on  Performance  Ratings  .  375 

Shields,  J.  L.,  &  Hanser,  L.  M.  Designing.  Planning,  and  Selling 

Project  A .  393 

Wing,  H.,  Hough,  L.  M.,  &  Peterson,  N.  G.  Predicting  Validity 

of  Noncognitive  Measures  for  Army  Classification  and  Attrition  399 

Wise,  L.  L.,  Campbell,  J.  P.,  &  Peterson,  N.  G.  Identifying 

Optimal  Predictor  Composites  and  Testing  for  General izabili tv 

Across  Jobs  and  Performance  Constructs  .  415 

Young,  W.  Y.,  Houston,  J.  S.,  Harris.  J.  H.,  Hoffman,  R.  G.,  & 

Wise,  L.  L.  Large-Scale  Data  Collection  and  Data  Base 

Preparation .  433 


Papers  Presented  at  the  Annual  Convention  of  the  American 
Psychological  Association,  New  York,  August  1987 


Barge,  B.  N.  Characteristics  of  Biodata  Items  and  Their 

Relationship  to  Validity  .  453 

Gast,  1.  F.,  Campbell,  C.  H.,  Steinberg,  A.  G.,  &  McGarvey,  D.  A. 

A  Task-Based  Approach  for  Identifying  Junior  Noncommissioned 
Officers'  Kev  Responsibilities  .  479 

Hough,  L.  M.  Overcoming  Objections  to  the  Use  of  Temperament 

Variables  in  Selection  .  509 

Nord,  R.,  &  White,  L.  A.  Optimal  Job  Assignment  and  the  Utility 

of  Performance;  Some  Key  Issues  .  543 

Pulakos,  E.  0.,  Hanson,  M.  A.,  Borman,  W.  C.,  Hal  lam,  G.,  Carter,  G., 

&  Owens-Kurtz,  C.  Developing  Behavioral  Rating  Scales 

To  Evaluate  Second-Tour  Performance  in  the  Army  .  569 

Rumsey.  M.  G.  Getting  Answers  to  the  Right  Questions;  Job  Analysis 

Strategy  .  595 


vii 


IMPROVING  THE  SELECTION,  CLASSIFICATION.  AND  UTILIZATION 
OF  ARMY  ENLISTED  PERSONNEL 

ANNUAL  REPORT.  1987  FISCAL  YEAR 
SUPPLEMENT  TO  ART  TECHNICAL  REPORT  862 


PURPOSE  OF  THE  REPORT 

The  materials  presented  in  this  report  were  prepared  under  Project  A, 
the  U.S.  Army's  current,  large-scale  manpower  and  personnel  effort  for 
improving  the  selection,  classification,  and  utilization  of  Army  enlisted 
personnel.  This  Research  Note  supplements  ARI  Technical  Report^  862  .  the 
Project  Annual  Report  for  the  1987  Fiscal  Year.  It  augments  that  report  by 
providing  copies  of  a  set  of  technical  papers  that  were  prepared  during  the 
year  reporting  on  detailed  phases  of  the  project  research  methods  and 
results. 

OVERVIEW  OF  PROJECT  A 

Project  A  is  a  comprehensive  long-range  research  and  development 
program  the  U.S.  Army  has  undertaken  to  develop  an  improved  system  for 
selecting  and  classifying  enlisted  personnel.  The  Army's  goal  is  to 
increase  its  effectiveness  in  matching  first-tour  enlisted  manpower 
requirements  with  available  personnel  resources,  through  use  of  new  and 
improved  selection/classification  tests  that  will  validly  predict  carefully 
developed  measures  of  job  performance.  The  project  addresses  the  Army's 
675,000-person  enlisted  personnel  system  encompassing  several  hundred 
military  occupations. 

The  program  began  in  1980,  when  the  U.S.  Army  Research  Institute  (ARI) 
started  planning  the  extensive  research  needed  to  develop  the  desired 
system.  In  1982  ARI  selected  a  consortium,  led  by  Human  Resources  Research 
Organization  (HumRRO)  and  including  American  Institutes  for  Research  (AIR) 
and  Personnel  Decisions  Research  Institute  (PORI),  to  undertake  the  9-year 
project.  It  is  utilizing  the  services  of  40  to  50  ARI  and  consortium 
researchers  working  collegially  in  a  variety  of  professional  specialties. 
The  Project  A  objectives  are  to; 

e  Validate  existing  selection  measures  against  both  existing 
and  project-developed  criteria  (including  both  Army-wide 
job  performance  measures  based  on  rating  scales,  and  direct 
hands-on  measures  of  MOS-specific  task  performance). 

e  Develop  and  validate  new  selection  and  classification  measures. 

e  Validate  intermediate  criteria  such  as  training  performance, 
as  predictors  of  later  criteria,  such  as  job  performance,  so 
that  better  informed  decisions  on  reassignment  and  promotion 
can  be  made  throughout  a  soldier's  career. 


1 


t  Determine  the  relative  utility  to  the  Army  of  different 
performance  levels  across  MOS. 

e  Estimate  the  relative  effectiveness  of  alternative  selection 
and  classification  procedures  In  terms  of  their  validity  and 
utility  for  making  decisions. 

The  research  design  incorporates  three  main  stages  of  data  collection 
and  analysis  in  an  iterative  progression  of  development,  testing, 
evaluation,  and  further  development  of  selection/classification  instruments 
(predictors)  and  measures  of  job  performance  (criteria).  In  the  first 
iteration,  file  data  from  fiscal  years  (FY)  1981/1982  were  evaluated  to 
explore  relationships  between  scores  of  applicants  on  the  Armed  Services 
Vocational  Aptitude  Battery  (ASVAB),  and  their  later  performance  in  training 
and  their  scores  on  first-tour  Skill  Qualification  Tests  (SQT). 

For  the  ensuing  research,  19  Military  Occupational  Specialties  (MOS) 
were  selected  as  a  representative  sample  of  the  Army's  250-*-  entry-level  MOS. 
The  selection  was  based  on  an  initial  clustering  of  MOS  derived  from  rated 
similarities  of  job  content.  These  MOS  account  for  about  45  percent  of  Army 
accessions  and  provide  sample  sizes  large  enough  so  that  race  and  sex 
fairness  can  be  empirically  evaluated  in  most  NOS. 

In  the  second  iteration,  a  Concurrent  Validation  design  was  executed 
with  FY83/84  accessions.  A  "Preliminary  Battery"  of  perceptual,  spatial, 
temperament,  interest,  and  biodata  predictor  measures  was  developed  and 
tested  with  several  thousand  soldiers  as  they  entered  four  MOS.  The  data 
from  this  sample  were  then  used  to  refine  the  measures,  with  further 
exploration  of  content  and  format.  The  revised  set  of  measures  was  field 
tested  to  assess  reliabilities,  "fakability,"  practice  effects,  and  other 
factors.  The  resulting  predictor  battery,  the  "Trial  Battery,"  was 
administered  together  with  a  comprehensive  set  of  job  performance  indexes 
based  on  job  knowledge  tests,  hands-on  job  samples,  and  performance  rating 
measures,  in  the  Concurrent  Validation  during  the  summer  and  fall  of  1985. 
The  results  of  the  Concurrent  Validation  were  used  to  form  five  performance 
constructs  and  to  report  to  the  Army  incremental  validities  of  the  Trial 
Battery  components  over  ASVAB  predictors. 

On  the  basis  of  testing  experience,  the  "Trial  Battery"  was  revised  as 
the  "Experimental  Predictor  Battery,"  which  in  turn  is  being  administered  in 
the  third  iteration,  the  Longitudinal  Validation  stage,  which  began  in  the 
late  summer  of  1986.  All  measures  are  being  administered  in  a  true  predic¬ 
tive  validity  design.  About  50,000  soldiers  across  21  MOS  are  included  in 
the  FY86-87  administration  and  subsequent  first-tour  measurement.  About 
3,500  of  these  soldiers  are  expected  to  be  available  for  second-tour  per¬ 
formance  measurement  in  FY91.  Three  MOS  were  added  to  the  original  19  (19K, 
29E,  and  96B),  and  one  of  the  original  NOS  was  dropped  (76W). 

For  administrative  purposes.  Project  A  is  divided  into  five  research 
tasks:  Task  1,  Validity  Analyses,  and  Data  Base  Management;  Task  2, 


2 


Developing  Predictors  of  Job  Perforaance;  Task  3,  Developing  Measures  of 
School /Training  Success;  Task  4,  Developing  Measures  of  Amy-Wide 
Perfomance;  Task  5,  Developing  M0S-Spec1f1c  Performance  Measures. 

Activities  during  the  first  four  years  of  Project  A  were  reported  as 
follows:  FY83,  ARI  Research  Report  1347  and  Its  Technical  Appendix,  ARI 
Research  Note  83-37;  FY84,  ARI  Research  Report  1393  and  two  related  reports, 
ARI  Technical  Report  660  and  ARI  Research  Note  85-14;  FY85,  ARI  Technical 
Report  746  and  ARI  Research  Note  (in  preparation);  FY86,  ARI  Technical 
Report  813101  and  ARI  Research  Note  8913704. 

Other  publications  on  specific  activities  during  those  years  are  listed 
In  the  above  reports.  The  annual  report  on  project-wide  activities  during 
FY87  is  presented  In  ARI  Technical  Report  862  .  The  technical  papers 
reproduced  In  this  Research  Note  serve  as  additional  documentation  for 
various  FY87  activities. 


3 


RELATIONSHIP  OF  SQT  SCORES  TO 
PROJECT  A  MEASURES 


Jane  M.  Arabian  and  Jeanne  K.  Mason 
U.S.  Army  Research  Institute 


Presented  on  Session,  “Test  Validation" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Array 
Research  Institute  or  the  Department  of  the  Army. 


5 


••Litioa^p  ef  SQT  feom  to  fnioet  k  Mmutm 

JBM  M.  Arabian  and  jbbtww  K.  Nuoo 
0.  S.  Aony  Aaarirrti  Institiita 
AlasMdria#  Virginia 

Iba  Any  dafvalaps  and  adniniatara  Skill  Qualification  ivata  (SOT)  to  aol- 
diara  in  aany  of  tha  Military  Occupational  Spaeialtiaa  tMOS)  •  Iba  taating  pro- 
gran  Mua  originally  intandad  to  diagnoaa  naaSu  for  training.  Howavar^  SQT 
aeoraa  ara  alao  uaad  for  paraonnal  aanagnaant  daciaiona  (a.g.*  pronotion  policy 
daciaionar  diatribution  goala  for  aoldiar  quality*  ate.) 

Although  sett  ara  not  davalc^ad  for  all  M06*  particularly  tha  anallar  HOS* 
tha  MOS  that  do  have  SC3T  rapraaant  a  variaty  of  occupational  ^acialtiea  and  a 
large  proportion  of  Army  accaaaions.  Further*  tha  teat  adniniatration  and 
acoia  reporting  progran  ia  wll-aatabliahad*  rendering  the  SQT  acoras  readily 
acoaasibla  to  the  Army  raaaarch  connunity.  Since  tfaaaa  akill  taata  ara  adnini- 
atarad  to  aoldiars  after  achool  training  (AIT)  *  uhen  aoldiara  have  had  axperi- 
anca  performing  in  their  apacialty*  the  SQT  aeoraa  have  bean  aa^loyed  aa  proxy 
anaaurea  of  job  parfotmanca  to  aupport  peraonnel  policy  daciaiona.  However*  the 
aaaun^ion  that  SQT  can  be  validly  uaad  aa  a  anaaura  of  job  performance  haa  not 
bean  taatad  directly. 

Converging  evidence  doea  auggeat  that  SQT  ara  viable  meaauraa  of  job  per- 
franance.  For  axan^le*  the  diatribution  of  SQT  acorea  by  ASVAB  (Armed  Services 
Vocational  Aptitude  Battery)  acoras*  more  ^ecifically  Aptitude  Area  (AA)  com¬ 
posite  scores  from  ASVAB*  wre  employed  by  proponent  schools  to  support  par¬ 
ticular  MOS  AA  entry  score  requirements.  Along  with  tht  proponmts*  input*  the 
Axny*s  suhnission  to  Congress  on  Army  manpower  quality  goala  also  included  data 
on  the  relationship  of  written  and  hands-on  performance  scores*  obtained  from 
TBASANA*  with  ASVAB  scores  (Office  of  the  Assistant  Secretary  of  Defense* 

1985) .  ftiile  both  seu  of  data  (SQT/ASVAB  and  IDASANA  data/ASVAB)  produced 
similar  results*  namely  a  positive  relatimship  between  ASVAB  and  the  perform¬ 
ance  maasures*  direct  examination  of  the  relationship  between  SQT  and  TRASANA 
hands-on  and  written  test  scores  %ias  precluded  by  tha  maall  raaiber  of  cases 
available  with  both  sets  of  scores.  Consequently*  it  was  not  possible  to  de¬ 
termine  the  validity  of  SQT  acoras  aa  meaauraa  of  job  perfomanca  at  that  time. 

With  the  oollactioi  of  job  performance  data  frem  tha  1985  (concurrent  vali¬ 
dation)  tasting  phaM  of  the  Army's  Project  A*  "Bi^oving  the  Selection*  Clas¬ 
sification  and  Utilization  of  Army  Enlisted  Personnel"*  and  tha  merging  of  SQT 
data  into  the  Project's  research  database*  it  has  becoiie  possible  to  validate 
SQT  scores  against  independently  developed  criteria  of  job  performance.  The 
Project  A  measures  selected  for  this  SQT  validation  reaeardi  include  p^er  and 
pencil  measures  of  achool  knowledge  and  job  knowledge  aa  well  aa  a  work  sanple 
(hands-on)  maaaure  of  job  proficiency.  If  the  results  of  this  research  damon- 
strate  a  strong  positive  relationship  between  SQT  scores  and  ttw  Project  A 
measures*  then  it  could  be  confidently  asserted  that  the  SQT  are  valid  measures 
ef  job  perfenaanoe.  Ose  of  SQT  data  would  than  be  empirically  justified  as  a 
measure  of  job  performance  for  personnel  — decisions. 


iwcnoQ 

Sdrtects 

The  subjects  in  the  present  research  are  a  sub-sample  of  the  project  A 
concurrent  validation  sample.  The  data  for  Project  A  were  collected  from  June 
to  NovoBter  1985.  The  soldiers  were  all  at  Skill  Level  1  with  18  to  24  months 


6 


in  tte  Anny  at  th«  tiM  of  tasting.  9m  aub-anpla  had  3,117  sol- 
diats  with  tast  aeoras  for  aach  sMSusa  of  intarast  (SQT  and  thraa  Projact  A 
SMasusas  as  wall  as  ASVAB)  •  Soldiars  f ton  tits  following  ai^t  HOS  wara  sapra- 
aantad  in  tha  sub-s— pla:  IIB— infantsynan;  13B— Cannan  Craman;  19E— Tank 
QraMsan;  31C— Radio  lala^pa  Opasatot;  €3B-^ght  Ibaalad  VMticla/Powsr  Oanara- 
tion  Hschanic;  64C— Motor  Ttanqport  Orator;  71L— Artninistrativa  Racialist; 
9SB— Military  Polioa. 

Maasuras 

The  SGT  is  a  aultipla  ehoioe,  written  tast  of  ovatall  MOS  knowledge  de¬ 
signed  for  a  2-hour  adninistration  period.  Soldiers  are  tasted  by  NOS  and 
Skill  Laval.  Tasks  included  in  tha  SQr  are  randcnly  aalactad  from  the  Sol¬ 
dier's  Manual  for  a  given  HOS.  Approxinataly  20-35  tasks  (maxinwan  of  l€l 
itwns)  are  included  in  an  SQT.  Tha  notice  announcing  tha  tast  includes  a  list 
of  ISn  of  tha  tasks  that  %fill  appear  on  tha  test.  Tha  overall  SQT  score  is  a 
paroentaga  ccaiputad  by  adding  all  scores  fron  aadi  task  and  dividing  tha  sin  by 
the  total  nnber  of  tasks  on  the  test.  Further  infoaation  about  SQT  davel^ 
■ent  and  adkninistration  is  available  in  tha  SQT  Test  Devalopnent  Manual 
(TRADOC,  1983} .  SQT  scores  used  in  tha  present  research  ware  fron  the  1985 
adninistration  with  the  exception  of  HOS  31C,  whose  aeoras  wara  fron  1986. 

The  nrojact  A  School  Knowledge  tests  (K3) ,  also  labelled  Job-Relevant 
Knowledge  tests,  wara  developed  to  anasure  the  cognitive  ccoponent  of  training 
(sdwol)  success.  Test  itsns  wara  baaed  on,  a.g.,  the  Amy  Occupational  Survey 
Progrm  and  Progran  of  Instruction  (course  curriculin)  infomation  for  each 
HOS.  All  itans  wara  reviewed  by  job  incinbents,  adtool  trainers  and  appropri¬ 
ate  HOS  training  proponents  for  content,  accuracy,  etc.  The  K3  tast  for  aach 
HOS  contained  approximately  150  multiple  choice  itans  and  was  adhninistered  in  a 
2-hour  period.  A  detailed  description  of  the  tast  devalopnent  procedure  and 
psydxnetric  properties  of  tha  tests  can  be  found  in  R.  Davis,  G.  Davis, 

Joyner,  and  de  Vera  (1985). 

The  development  procass  and  psydiometric  properties  of  the  Task-Based 
HOS-Specific,  Job  Knowledge  (K5)  and  Hands-On  (HO) ,  naasuras  are  described  in 
C.  Campbell,  R.  Cmp^ll,  Ransey  and  BSwards  (1985).  Briefly,  tha  job  perform¬ 
ance  donain  for  aadi  HOS  was  determinad  fron  several  sources,  including:  the 
Amy  Occupational  Survey  Progran  results.  Soldier's  Mmual  of  Coonon  Tasks, 
HOS-specific  Soldier's  Manuals,  and  ii^t  fron  the  HOS  proponent  agency.  Sub¬ 
ject  matter  experts  provided  judgpnnts  of  task  criticali^,  difficulty  and 
similarity.  Separate  panels  of  subject  natter  experts  in  each  HOS  used  the 
judgmnts  to  select  NOS  tasks  for  K5  weasure  development.  The  written  K5  neas- 
ures  cover  some  30  tasks  and  have  approximately  150-200  multiple  choice  itnns 
which  require  about  2  hours  for  adaiaistration.  The  MO  measures  are  a  sub-aet 
^  15  of  the  30  tasks  covered  in  the  K5  anasure  for  each  NOS.  Aside  from  lo¬ 
gistical  constraints  (e.g.,  tasks  too  haardoua  to  tast),  tasks  selected  for 
testing  in  the  HO  node  entailed  physical  strength  or  skilled  pRychonotor  per- 
fosaance,  perfosaance  within  a  tian  limit,  many  procedural  stapa,  and/or  steps 
that  are  uncuad  in  tiieir  normal  soquance. 

Data  and  Analyses 

A  workfile  was  created  fron  the  Project  A  longitudinal  researdi  database. 
The  workfile  contained  Skill  Level  1  SQT  score,  average  percent  correct  K3,  R5 
and  HO  scores  and  AS\SA  AA  composite  score  for  eech  ease  (subject) .  The  AA 
score  is  used  in  the  Amy  enlistment  process  as  the  primal  classification 
eligibili^  naasure  for  each  MOS.  univariate  descriptive  statistics  and  corre¬ 
lation  analyses  were  perfomed  using  the  SAS  statistical  package. 


7 


•Mults  and  OiacoBsion 

lha  univaziat*  daacriptiva  atatiatiea  for  aadi  parfoxnanoa  vaziabla  (SQT, 
K3,  K5  and  HO)  by  H06  aza  pzaaantad  balow.  lhaza  ia  aatisfactozy  vazianca  and 
ranpa  in  tha  data  to  paxait  furthaz  analyaaa. 

Mto  1 


■f _ m _ B _ & 


u* 

M 

T».» 

M.a* 

••.K 

Tl.M 

m 

U.TS 

U.T1 

11.44 

T.Tl 

■ 

•14 

•M 

M4 

Ml 

Mn 

44 

IT 

S 

4S 

mM 

IM 

M 

M 

•1 

la 

M 

T1.U 

M.n 

m.u 

•!.» 

• 

U.M 

11.1T 

u.$$ 

U.«T 

■ 

M7 

M» 

i» 

Ml 

Ma 

M 

IT 

M 

M 

mM 

IM 

T* 

M 

•1 

m 

n 

t4.» 

••.•• 

•1.N 

TI.M 

m 

«.w 

11.71 

$M 

•.« 

■ 

411 

4ia 

2H 

$$1 

na 

• 

M 

M 

M 

Ma 

M 

•• 

$t 

•1 

lie 

M 

T4.T* 

w.w 

M.M 

W.Sl 

m 

$.12 

U.M 

U.ll 

•.M 

U 

111 

IM 

m 

MS 

Wa 

11 

11 

IT 

IT 

tmM 

•1 

•4 

•1 

M 

UI 

m 

•1.11 

M.M 

•1.S1 

M.B 

• 

•.U 

11.41 

M.T4 

».44 

■ 

su 

MS 

Ml 

4T1 

11 

M 

IT 

•1 

Ma 

•t 

•4 

M 

M 

•c 

M 

•I.IT 

•l.U 

M.n 

71.41 

m 

T.M 

U.IT 

t.ti 

(.IT 

« 

Ml 

M7 

Ml 

•11 

Ma 

U 

M 

M 

41 

Ml 

M 

•4 

•1 

M 

TIL 

R 

T1.44 

W.1S 

>T.M 

•1.14 

• 

u.w 

U.U 

M.U 

M.M 

■ 

411 

414 

411 

411 

wa 

11 

M 

M 

M 

Ma 

M 

M 

•4 

W 

n 

TI.W 

H.TT 

M.M 

T*.7l 

* 

».w 

lf.ll 

•.M 

•.IT 

■ 

$2$ 

•M 

••4 

Ml 

wa 

4t 

W 

14 

M 

Ma 

M 

•1 

M 

•1 

Oozzalationa  aaza  obtainad  batwaan  tha  appzopziata  Mi  conpoaita  acoza,  SQTf 
and  aach  Pzojaet  A  pazfomanoa  aaaauza  by  nos  for  caaaa  vith  eoaplata  data.  The 
oozzalationa,  in  tha  table  balow,  are  panazally  oonaiatant  with  data  fzon  other 
atudiaa.  Tha  SOT  are  poaitivaly  eozzalatad  with  tha  ASVAB  Mi  conpoaita  aoozaa 
aa  wall  aa  with  tha  project  A  pazfoznanea  naaauzaa.  Sinoa  tha  focua  of  thia 
zaport  ia  on  tha  ralatiooahip  batwaan  SQfT  and  othaz  naaauzaa  (i.a.,  K3,  KS  and 
HO)  of  job  pazfoznanoa,  wai^tad  avazagaa  uaing  tha  FiMiaz  a  tzanafocnatioo 
ware  ccnpdtad  acroaa*HOS  only  for  tha  SGfT  and  Projaet  A  oozzalationa  and  tha 
intazoocralationa  anong  tha  Projaet  A  naaauzaa.  Aa  would  ba  aiyactad,  tha 
oozzalationa  batwaan  awa  node  naaauzaa  (p^az  and  pencil,  a.g.,  SQT:K5,  K3:lS) 
are  aoanwhat  higher  than  tha  czoaa-aoda  (pi^z  and  pancil  va  handa-on,  a.g., 
K3:H0,  8QT:H0)  cozzalationa. 


8 


Mto  a 

Bwnlf****  cwatici— m  c—  wtt>  m  vrtui— 


m  1 

nSmnt 

a 

moat 

wua 

WUB 

- 

«t«n 

4BrtB 

— 

OlB 

BlM 

BtM 

ua 

m 

m 

.433 

.431 

.132 

.MS 

.121 

.SM 

.341 

OM 

.347 

.4U 

Ui 

Wk 

«u 

.3>S 

MB 

UU 

.142 

.4M 

.Ml 

.433 

.CM 

042 

041 

m 

OD 

lU 

.m 

.4M 

.177 

.IM 

.MS 

.4U 

.3B 

.734 

.341 

.Ml 

nc 

«c 

aw 

.U4 

.3*2 

.412 

.322 

.172 

.137 

U14 

OM 

OM 

.442 

« 

m 

m 

.Ml 

.441 

.142 

.2N 

.SM 

.M7 

.347 

.731 

.4U 

.314 

•c 

W7 

.4M 

Mi 

.473 

.323 

.Ml 

.441 

074 

Mt 

.SM 

.444 

m 

CL 

a« 

.474 

.Mt 

.144 

.371 

.174 

.134 

.447 

.7M 

.4U 

.442 

IT 

MS 

.443 

.Ml 

.342 

.331 

.347 

.SB 

.331 

.M3 

.374 

.144 

■  >  3U7 


.m  .S17  .m  .t7«  .m  .«7» 


The  correlations  between  SQT  and  the  three  Project  A  OMasures  were  cor¬ 
rected  for  attenuation  and  range  restrictim.  The  reliabili^  estiirates  for 
the  Project  naasurea,  used  for  the  attenuation  correction »  are  presented  below. 
SQT  reliability  estiutes  %iere  not  available;  therefore,  the  corrections  were 
baaed  on  only  the  Project  A  laeasures. 


ttbl*  ) 

ifwl  e— i»tnev  ■tiUbillty  mi—t— 


loo  MIMl 
RnMlSfK—BBdS* 


-fii - 

.i4  tiiir 

'.i*'(W)~ 

.4i  (444) 

111 

.71  (412) 

.41  (434) 

.44  (444) 

144 

.43  (4741 

.44  (414) 

.41  (441) 

31C 

.74  1341) 

.M  (324) 

.43  (144) 

434 

.12  (144) 

.47  (144) 

.44  (412) 

44C 

.44  (4W) 

.B  (4M) 

.M  (U4) 

71L 

.73  (444) 

.42  (Ml) 

.M  (443) 

ja _ 

twi) 

fttjrn 

9im. 


Nith  respect  to  the  eoccection  for  range  raetrietien,  a  foraula  was  sbk 
ployed  which  is  appropriate  for  the  correlation  of  a  new  waature,  such  as  the 
project  ■sasuras,  with  an  existing  esitecion,  the  SOfT,  whM  aelection  has  been 
■ado  on  a  third  variable,  in  this  ease  AA  conposite  score  (Guilford,  1965) . 

The  correlations  betiiaan  SQT  and  the  Project  A  maasnras,  corrected  for 
attenuation  md  range  restriction  are  presented  below.  Again,  weighted  averages 
of  the  validity  coefficients  across  NOS  ware  computed,  xt  can  be  seen  in  the 
table  below  that  SQT  is  strongly  correlated  with  each  of  the  independent  sms- 
ures  of  job  perfomance.  The  soaewhat  lower  average  correlation  between  SQT 
and  HO  scores  may  be  attributable  at  least  in  part  to  eaasureiiant  note  differ¬ 
ences  (written  vs  hands-on) . 


9 


4 


- 

«vi  n 

mtt  n 

— 

aw 

.444 

.CM 

.cai 

aw 

Mi 

ww 

.04 

an 

.W 

.m 

.444 

we 

Mi 

.ca 

<a 

.1M 

.iw 

jm 

4<e 

.441 

•m 

Xti 

m 

.m 

.444 

.CM 

• 

jm 

Ma 

.444 

t  .tn 

jm 

In  ocdtt  to  con^r*  scorM  across  ths  four  ssssuras,  oqui-parosntila  squat- 
ing  mM  parformad;  tha  rasults  ara  praaantad  balow.  Sines  €0  is  uaad  as  tha 
passing  scora  for  SCft,  tha  pareantila  for  a  seora  of  60  on  SOT  aas  uaad  to 
dataaina  conparabla  Un  tarns  of  pareantila)  acoras  for  tha  K3,  XS  and  HO 
■aasuras.  Thus,  UB  soldiars  trith  an  SQT  scora  of  60  ara  in  tha  6.03  parcan- 
tila.  For  tha  K3  naasura*  an  IIB  aoldiar  in  tha  6.03  pareantila  would  hava  a 
acora  of  37.  lha  pareantila  for  SQT  seoras  of  60#  70  and  00  wars  datarainad 
along  with  tha  conparabla  seoras  on  tha  Frojact  A  naasuras.  Seoras  for  SQT# 

K3#  K5  and  HO  tasts  at  tha  50th  and  85th  pareantila  wars  also  calculatad. 

Tha  lowar  seoras  on  tha  Project  A  anasuras#  oonparad  to  tha  SQT  seoras# 
suggast  that  tha  Projact  tasts  nay  hava  baan  senawhat  nora  difficult.  Mhathar 
or  not  the  apparent  diffaranoas  in  difficulty  can  be  attributed  to  tast  eontant 
versus  the  opportunity  to  study  for  the  tast  cannot  be  ascartainad.  However# 
it  Should  be  noted  that  SQT  test  dates  wi^  150%  of  tiia  tasks  to  be  eovarad  ara 
published  before  tasting;  this  is  not  tha  case  with  tha  Project  A  tasting. 


■» 

4114 

ait 

41 

O 

m 

UU 

4gr 

u 

a 

m 

ua 

4.41 

14 

11 

m 

14 

4M 

44 

41 

u 

m 

14.44 

44 

m 

m 

44 

41.41 

14 

M 

n 

m 

44.11 

41 

11 

•1 

n 

44.44 

44 

41 

n 

m 

14 

m 

•1 

•1 

41 

M 

41 

m 

•0 

a 

44 

m 

11 

41 

•t 

m 

41 

41 

a 

a 

U4 

U.» 

44 

41 

44 

11 

a.4i 

m 

M 

a 

44 

m^m 

14 

U 

44 

41 

4.44 

14 

n 

41 

a 

n.m 

14 

41 

m 

•• 

M.U 

14 

14 

a 

m 

u 

41 

14 

41 

41 

14 

41 

u 

a 

It 

44 

« 

m 

n 

14 

14 

44 

41 

« 

44 

M4 

4.14 

44 

m 

44 

41 

414 

41.11 

m 

«• 

m 

a 

41.41 

44 

41 

C1 

41 

41.14 

n 

w 

a 

a 

44.U 

14 

44 

41 

41 

41.11 

dp 

44 

•1 

it 

w 

41 

44 

44 

44 

14 

41 

m 

14 

41 

41 

11 

« 

ft 

44 

14 

44 

M 

• 

41 

IK 

4.41 

44 

m 

41 

14 

a.44 

m 

a 

a 

m 

41.14 

14 

14 

44 

44 

4.41 

IP 

0) 

m 

m 

44.44 

14 

44 

•1 

44 

M.M 

m 

41 

m 

41 

14 

14 

41 

dl 

41 

M 

It 

•t 

41 

11 

M 

41 

71 

7i 

44 

m 

•0 

«t 

41 

71 

10 


The  equi-percentile  equating  performed  on  this  data  set  should  not  be  taken  to 
suggest  cut  off  scores  for  the  Project  measures.  (Nor  would  it  be  reasonable  to  alter  the 
SQT  cut  off  given  only  the  data  presented  here).  While  it  would  be  possible  to  apply 
standard  setting  procedmes  to  the  Project  A  data,  it  would  not  be  athdsable  to  use  the 
SQT  score  of  60  to  set  standards  on  the  other  measures.  The  primary  reason  for  this 
position  is  that  the  SQT  cut  off  score  of  60  was  not  necessarily  derived  empirically  or 
validated  against  a  definition  of  minimally  acceptable  performance.  In  order  to  evaluate 
the  SQT  cut  off,  and  perhaps  determine  cut  o&  on  the  Project  A  tests,  additional 
information  would  be  needed  about  satisfactory  and  unsatisfactory  performance  levels. 

Conclusions 

Project  A  research  has  provided  a  unique  opportunity  to  validate  SQT  against 
independently  derived  measures  of  job  performance.  The  research  presented  in  this 
paper  strongly  supports  the  validity  of  SQT  as  a  measure  of  job  performance.  Although 
only  a  limited  number  of  MOS  were  in  the  sample,  the  variety  of  occupations  and  the 
consistency  of  the  results  suggest  that  SQT  in  general  (i.e.,  including  MOS  not  in  the 
sample)  may  serve  as  a  valid  measure  of  job  performance  for  personnel  management 
decisions.  Further  research  is  particularly  needed,  however,  to  validate  the  SQT  cut  off 
score. 


References 

Campbell,  C.  H.,  Campbell,  R.  C.,  Rumsey,  M.  G.,  &  Edwards,  D.  C.  (1985). 

Development  and  field  test  of  task-based  MOS-  specific  criterion  measures  (ARl 
Technical  Report  717).  Alexandria,  VA:  Army  Research  Institute. 

Davis,  R.  H.,  Davis,  G.  A.,  Joyner,  J.  N.,  &  de  Vera,  M.  V.  (1985).  Development  and 
field  test  of  job«relevant  knowledge  tests  for  selected  MOS  (ARl  Technical  Report 
7571.  Alexandria,  VA:  Army  Research  Institute. 

Gilford,  J.  P.  (1965).  Fundamental  statistics  in  psychology  and  education  (Fourth 
Edition).  New  York:  McGraw-hill  Book  Company.  Office  of  the  A^istant 
Secretaiy  of  Defense  (Manpower,  Installations  and  Logistics).  (1985).  Report  to 
the  House  and  Senate  Committee  on  Armed  Services,  Quality  of  Military  Enlisted. 
SAS  Institute,  Inc.  (1986).  Statistical  Analysis  SvstenL  Version  82.4.  Cary,  NC:  SAS 
Institute,  Inc. 

TRADOC.  (1983).  Skill  qualification  tests  (SOT):  Policy  and  Procedures  (TRADOC 
Reg  351-21.  Fort  Monroe,  VA:  Department  of  the  Army. 

ACKNOWLEDGMENTS 

The  opinions,  views  and  conclusions  contained  in  this  document  are  those  of  the 
authors  and  should  not  be  interpreted  as  representing  the  official  policies,  expressed  or 
implied,  of  the  U.S.  Army  Research  Institute  for  the  Behavioral  and  Sodd  Sciences  or 
the  Department  of  Defense  or  the  United  States  Government  Project  A  is  an  Army 
Research  Institute  contractual  effort,  #2Q263731A792,  performed  by  Human  Resources 
Research  Organization,  American  Institutes  of  Research,  and  Personnel  Decisions 
Research  Institute.  The  authors  wish  to  express  their  iq>preciation  to  Winnie  Young  for 
preparing  the  data  file  used  in  this  research. 


11 


SKILL  REQUIREMENT  INFLUENCES  ON 
MEASUREMENT  METHOD  INTERCORREUTIONS 


Charlotte  H.  Campbell 
Human  Resources  Research  Organization 


Michael  G.  Rumsey 
U.S.  Army  Research  Institute 


Presented  on  Session,  "Issues  in  Hands-On  Performance  Testing" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


13 


Skill  Requirement  Influences 
on  Measurement  Method  Intercorrelations 

Charlotte  H.  Campbell 
Human  Resources  Research  Organization 

Michael  0.  Rumsey 

U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences 


The  Army  is  currently  engaged  in  a  project,  conroonly  referred  to  as 
Project  A,  to  develop  a  job-based  selection  and  -classification  system.  The 
project  involves  the  linking  of  existing  and  newly  developed  predictor 
measures  to  measures  of  performance  in  the  Army.  The  success  of  the  project 
will  depend  in  no  small  part  on  the  degree  to  which  the  performance  measures 
accurately  and  comprehensively  reflect  actual  performance  of  Army  jobs. 

Toward  the  end  6f  developing  a  comprehensive  performance  measurement  system, 
we  have  developed  four  different  kinds  of  measures— ratings,  administrative 
measures,  hands-on  job  performance  (work  sample)  measures,  and  job  knowledge 
measures. 

Here  we  focus  on  two  of  the  testing  methods— hands-on  performance  tests 
and  job  knowledge  tests.  It  has  been  suggested  that,  short  of  measurement  in 
an  actual  job  situation,  a  hands-on  test  has  the  highest  fidelity  of  any  type 
of  measure  (Vineberg  &  Taylor,  1978).  Yet,  probably  because  of  the  enormous 
expense  associated  with  hands-on  tests,  they  are  seldom  used.  Written  tests 
are  less  costly  to  administer  and  in  some  cases  may  be  as  appropriate  as,  or 
more  appropriate  than,  hands-on  tests.  To  use  an  example  presented  by 
Vineberg  and  Taylor  (1972),  a  knowledge  test  is  better  suited  to  assess  an 
automobile  driver‘s  knowledge  of  driving  rules  and  road  signs  than  a  hands-on 
test. 


It  is  of  considerable  practical  interest  to  know  the  extent  to  which  the 
two  testing  methods  are  interchangeable.  If  it  could  be  shown  that  both 
methods  provide  virtually  identical  information,  then  one  could  be  eliminated 
and  considerable  savings  could  be  achieved.  Otherwise,  one  must  consider  the 
possibility  that  each  type  of  measure  provides  a  unique,  valid  contribution 
to  an  overall  assessment  of  an  incumbent's  job  proficiency  and  that  teth  are 
needed  to  obtain  maximum  job  coverage. 

An  investigation  by  Rumsey,  Osborn  and  Ford  (1985)  used  meta-analytic 
procedured  to  examine  the  relationship  between  hands-on  and  job  knowledge 
tests.  Excluding  investigations  which  used  a  language-oriented  work  sample, 
they  found  a  mean  correlation  of  .57,  adjusted  for  attenuation,  between 
hands-on  and  job  knowledge  tests.  This  correlation  suggests  some  degree  of 
overlap  but  not  total  interchangeability. 

Are  there  factors  which  might  substantially  moderate  the  correlation 
between  the  two  types  of  measures?  Rumsey,  et  al.  (1985)  found  some  evidence 

This  research  was  funded  by  the  U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences,  Contract  No.  MDA903-82-C-0531.  All 
statements  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  express  the  official  opinions  or  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


14 


that  type  of  work  sample  had  an  impact,  as  correlations  for  investigations 
using  verbal  performance  tests  tended  to  exceed  those  in  investigations  using 
motor  performance  tests.  These  investigators  also  found  limited  support  for 
the  proposition  that  type  of  occupation  influences  the  correlation  obtained. 
However,  much  remains  to  be  learned  about  potential  moderating  factors. 

Vineberg  and  Taylor  (1972)  have  suggested  that  the  extent  to  which  a  job 
requires  skill  is  an  important  consideration  in  examining  correlations 
between  knowledge  and  work  sample  measures.  They  noted  that  skill,  unlike 
knowledge,  can  only  be  acquired  through  practice.  Job  knowledge  tests  are 
presumably  best  suited  to  measure  knowledge;  performance  tests  are  presumably 
best  suited  to  measure  job  skills.  For  those  jobs  in  which  task  requirements 
can  be  reduced  to  job  knowledge,  the  correspondence  between  the  two  types  of 
measures  should  be  high;  for  those  in  which  skill  is  an  important 
requirement,  the  correspondence  should  be  lower. 

The  effort  reported  here  involved  first  identifying  the  skills  that  are 
required  to  perform  hands-on  tasks  that  are  tested  in  nine  military 
occupational  specialties  (HOS)  in  Project  A.  Then,  the  extent  to  which  these 
requirements  moderate  correlations  between  job  knowledge  and  hands-on  test 
scores  was  determined. 


Method 


Occupations  (MOS).  Performance  tests  and  job  knowledge  tests  were 
developed  for  nine  Army  occupations,  or  Military  Occupational  Specialties 
(MOS).  These  MOS  were  selected  to  be  as  representative  of  the  full  set  of 
entry-level  MOS  as  possible,  covering  the  range  of  job  content.  Career 
Management  Fields,  and  ASVAB  Aptitude  Area  prerequisites.  The  MOS  are  shown 
in  Table  1. 

Task  Selection.  For  each  MOS,  selection  of  tasks  from  the  job  domain 
proceeded  according  to  four  criteria:  the  tasks  should  cover  the  job  content 
areas,  they  should  be  the  relatively  more  important  ones,  they  should  permit 
variability  of  performance,  and  they  should  not  be  of  very  low  performance 
frequency. 

Test  Development;  Fifteen  tasks  in  each  MOS  were  selected  for 
performance  testing  based  on  such  factors  as  number  of  cued  steps  and  degree 
of  skill  required.  Performance  tests  were  developed  to  score  the  soldier  on 
whether  each  step  of  the  task  was  performed  correctly,  and  to  provide 
standard  conditions  and  instructions  for  the  testing.  Multiple-choice  format 
job  knowledge  tests  were  also  developed  for  those  tasks  in  each  NOS.  All 
tests  were  pilot-tested,  and  later  field-tested  on  114  to  178  soldiers  in 
each  MOS.  Results  from  those  administrations  were  used  to  revise  the  tests; 
in  some  cases,  hands-on  tests  or  job  knowledge  tests  were  dropped. 

Data  Collection.  Between  June  and  November,  1985,  the  hands-on  and 
knowledge  tests  were  administered  to  over  5000  skill  level  1  soldiers  in  the 
nine  MOS,  at  14  sites  in  the  U.S.  and  Europe.  (This  was  Project  A's 
Concurrent  Validation  phase.)  The  numbers  of  soldiers  tested  in  each  MOS  are 
shown  in  Table  1.  Job  knowledge  tests  were  administered  by  project  staff; 
actual  scoring  of  the  performance  tests  was  done  by  NCO,  trained  in  scoring 
procedures  by  project  staff. 


15 


Table  I 


MOS  Selected  for  Testing  and  Numbers  of  Soldiers  Tested 


NOS 

Number 

Tested 

IIB 

Infantryman 

662 

13B 

Cannon  Crewman 

586 

19E 

Tank  Crevman 

434 

31C 

Single  Channel  Radio  Operator 

303 

63B 

Light  Wheel  Vehicle  Mechanic 

541 

64C 

Motor  Transport  Operator 

629 

71L 

Administrative  Specialist 

481 

91A 

Medical  Specialist 

480 

95B 

Military  Police 

638 

Knowledge/Proficiency  Assignments.  Three  project  staff  who  had  been 
involved  in  test  development  and  had  served  as  hands-on  test  managers  during 
the  Concurrent  Validation  testing  independently  sorted  the  hands-on  steps 
into  one  of  the  three  categories:  knowledge,  simple  motor,  or  complex  motor. 
The  level  of  agreement  among  the  judges  was  around  802  across  the  nine  MOS; 
disagreements  were  resolved  by  discussing  the  assignments  among  the  three 
judges. 

Because  each  performance  test  score  was  the  percent  of  steps  performed 
correctly,  we  classified  the  tests  as  K  (Knowledge)  if  at  least  half  of  the 
steps  had  been  sorted  into  the  knowledge  category,  and  as  P  (Proficiency)  if 
half  or  more  of  the  steps  were  in  the  two  proficiency  categories.  The  P 
tasks  were  further  categorized  as  PI  (simple  motor  tasks  where  manipulation 
is  trivial,  easy  to  perform,  and  easily  learned)  if  more  steps  were  in  the  PI 
category  than  in  either  of  the  other  two  categories,  or  as  P2  (complex  motor 
tasks  which  require  more  than  two  trials  to  perform  well)  if  more  steps  were 
in  the  P2  category  than  either  of  the  other  two  categories.  Tasks  where  the 

number  of  PI  and  P2  steps  were  the  same,  or  where  neither  PI  nor  P2 

outnumbered  the  K  steps,  were  held  out  of  analyses  that  compared  those  two 
levels  of  categorization. 

Table  2  shows  the  number  of  tasks  in  each  MOS  that  were  tested  in  both 
the  performance  mode  and  the  job  knowledge  mode,  and  the  number  of  tasks 
where  the  performance  test  was  categorized  as  K,  PI,  or  P2. 

Data  Analysis.  The  nine  MOS  had  between  14  and  17  tasks  tested  in  both 

the  job  knowledge  and  perfonnance  modes.  For  each  task,  the  scores  used  were 
the  percent  of  steps  performed  correctly  and  the  percent  of  items  answered 
correctly.  These  scores  were  then  correlated  by  task  across  the  soldiers  in 
each  MOS.  After  the  correlations  were  transformed  to  Fisher  £  scores,  they 
were  entered  into  an  analysis  of  variance,  with  the  nine  MOS  and  the 
knowledge/ proficiency  categories  as  independent  variables. 


Results 


Table  3  presents  the  means  and  standard  deviations  of  the  correlations 
between  performance  tests  and  job  knowledge  tests  for  each  of  the  nine  MOS; 


16 


Table  2 


Number  of  Tasks  Tested  in  Performance  and  Job  Knowledge  Nodes 
and  Number  of  Tasks  Assigned  to  Knowledge/Proficiency  Categories 
for  Nine  NOS 


NOS 

Tasks 

K 

PI 

P2 

TotaT 
P  * 

IIB 

Infantryman 

12 

2 

7 

2 

10 

13B 

Cannon  Crewman 

17 

2 

8 

7 

15 

19E 

Tank  Crevinan 

14 

5 

7 

1 

9 

31C 

Single  Channel  Radio  Operator 

15 

10 

4 

0 

5 

63B 

Light  Wheel  Vehicle  Mechanic 

15 

4 

4 

6 

11 

64C 

Motor  Transport  Operator 

14 

3 

5 

5 

11 

71L 

Administrative  Specialist. 

12 

4 

1 

7 

8 

91A 

Medical  Specialist 

15 

6 

6 

1 

9 

95B 

Military  Police 

16 

8 

4 

3 

8 

^Includes  tasks  not  clearly  PI  or  p2;  see  text. 


the  statistics  are  also  shown  for  the  groupings  of  tasks  based  on 
knowledge/proficiency  category  assignments.  (The  correlations  had  been 
transformed,  using  the  Fisher  2_  transformation,  before  calculating  the 
sumnary  statistics;  the  results  shown  in  Table  3,  however,  have  been 
transformed  back  to  Pearson  correlations.)  In  eight  of  the  NOS,  the 
individual  task  correlations  ranged  from  about  .00  to  .40;  in  one  NOS,  the 
highest  correlation  was  .19.  (Task  correlations  tend  to  be  substantially 
lower  than  correlations  for  entire  jobs;  hence,  the  level  of  these 
correlations  cannot  be  meaningfully  compared  with  earlier  findings.)  With 
the  large  number  of  soldiers  tested  in  each  NOS,  even  small  correlations 
(around  .08)  are  significant  at  the  .05  level.  Over  two-thirds  of  the 
correlations  in  every  NOS  were  significant  at  that  level. 

Two  analyses  of  variance  were  calculated,  using  the  transformed 
correlations  (Fisher  z)  as  the  dependent  variable.  In  the  first  ANOVA,  the 
nine  MOS  and  the  two  knowledge/ proficiency  categories  (K  and  P)  were  the 
independent  variables.  The  second  ANOVA  likewise  used  NOS,  and  also  the 
three  levels  of  the  knowledge/proficiency  categorization  (with  two  levels  of 
proficiency  -  simple  motor  (PI)  and  complex  motor  skills  {P2),  as  the 
independent  variables.  Both  ANOVA  results  are  suirroarized  in  Table  4. 

In  both  analyses,  the  main  effect  for  NOS  was  nonsignificant,  and  the 
interaction  terms  were  not  significant.  In  both  analyses,  the 
knowledge/proficiency  term  was  significant.  Where  knowledge/proficiency  was 
.considered  on  only  two  levels,  the  difference  favored  the  K  tasks,  where  the 
performance  test  had  been  categorized  as  predominantly  knowledge.  In  the 
second  analysis,  where  there  were  three  groups  of  tasks  -  knowledge  (K), 
simple  motor  (PI),  and  complex  motor  (P2)  -  comparisons  of  the  means  of  those 
groups  revealed  that  only  the  difference  between  K  tasks  and  PI  tasks  was 
significant  at  the  .01  level  (F  »  14.33,  df  *  2,95);  K  tasks  and  P2  tasks 
differed  slightly  (F  *  6.68,  df  =  2,95,  p  <  .10),  as  did  K  tasks  and  the 
combined  group  of  PI  tasks  and  P2  tasks  (F  *  7.581,  df  =  3,95,  p  <  .10),  The 
difference  between  PI  and  P2  tasks  was  not  one  bit  significant. 


17 


Table  3 


Means  and  Standard  Deviations  of  Perfonunce  x  Job  Knowledge  Test 
Correlations  by  Knowledge/Proficlency  Category  for  Nine  NOS 


HOS 

Tasks 

K 

PI 

P2 

^TotaT 

P 

IIB 

Infantryman 

N 

12 

2 

7 

2 

10 

Mean 

.17 

.26 

.18 

.09 

.15 

S.O. 

.37 

.14 

.16 

.02 

.14 

13B 

Cannon  Crevinan 

N 

17 

2 

8 

7 

15 

Mean 

.17 

.20 

.16 

.17 

.16 

S.O. 

.11 

.07 

.11 

.13 

.12 

19E 

Tank  Crewnan 

N 

14 

5 

7 

1 

9 

Mean 

.14 

.23 

.09 

.12 

.10 

S.D. 

.13 

.19 

.07 

- 

.06 

31C 

Single  Channel  Radio 

N 

15 

10 

4 

0 

5 

Operator 

Mean 

.20 

.22 

.15 

- 

.15 

S.D. 

.14 

.17 

.03 

- 

.03 

63B 

Light  Wheel  Vehicle 

N 

15 

4 

4 

6 

11 

Mechanic 

Mean 

.10 

.10 

.07 

.10 

.10 

S.O. 

,04 

.04 

.02 

.03 

.04 

64C 

Motor  Transport  Operator 

N 

14 

3 

5 

5 

11 

Mean 

.15 

.26 

.11 

.09 

.12 

S.D. 

.12 

.20 

.09 

.05 

.09 

71L 

Administrative  Specialist 

N 

12 

4 

1 

7 

8 

Mean 

.24 

.30 

.16 

.20 

.20 

S.D. 

.11 

.13 

- 

.09 

.09 

91A 

Medical  Specialist 

N 

15 

6 

6 

1 

9 

Mean 

.17 

.17 

.15 

.33 

.17 

• 

S.D. 

.13 

.18 

.08 

- 

.09 

95B 

Military  Police 

N 

16 

8 

4 

3 

8 

- 

Mean 

.15 

.18 

.10 

.10 

.11 

S.D. 

.11 

.11 

.06 

.17 

.10 

Across  NOS 

N 

130 

44 

46 

32 

86 

Mean 

.16 

.21 

.13 

.14 

.14 

S.O. 

.12 

.15 

.10 

.11 

.10 

13 


Table  4 


/Utalysli  of  Variance  Suanary  Tables  for  MOS  x  Knowledge/Proficiency 


SOURCE 

SS 

df 

MS 

Ratio  F 

p 

m  MOS 

.156 

8 

.020 

[1/41 

1.45 

<.25 

[23  K/P 

.137 

1 

.137 

[2/3. 

18.08 

<.01 

[3]  MOS  X  K/P 

[4]  Within  cell 

.061 

8 

.008 

[3/4; 

.57 

NS 

1.499 

112 

.013 

MOS  X  Knowledge/Simple  Notor/Complex  Motor 

c 

SOURCE 

SS 

df 

MS 

Ratio  F 

p 

[13  MOS 

.142 

8 

.018 

n/4] 

1.20 

NS 

[2]  K/P1/P2 

.108 

2 

.054 

[2/3] 

[3/4] 

4.90 

<.05 

[3]  MOS  X  K/P1/P2 

.161 

15« 

.011 

.73 

NS 

[4]  Within  cell 

1.388 

95 

.015 

'Reduced  by  1  df  for  missing  cell  estimation. 


Discussion 


There  Is  fairly  clear  evidence  here  that  the  differentiation  between 
knowledge  requirements  and  proficiency  requirements  on  hands-on  performance 
tests  explains  some  of  the  variability  In  correlations  between  the  two  modes 
of  testing.  When  the  steps  required  on  the  performance  tests  are  primarily 
knowledge  mediated,  and  are  demonstrations  of  the  acquisition  of  task 
knowledge,  then  the  correlations  with  written  tests  of  the  tasks  are  higher 
than  when  most  of  the  performance  test  steps  require  dononstratlon  of 
psychomotor  skill,  however  simple. 

Further  analyses,  already  underway,  will  Involve  meta-analysis  of  the 
obtained  correlations,  and  an  examination  of  the  knowledge/proficiency 
distinction  as  a  possible  moderator  variable. 


REFERENCES 

Rumsey.  N.  6..  Osborn.  U.  C..  &  Ford.  P.  (1985).  Comparing  work  sample  and 
Job  knowledge  measures.  Paper  presented  at  the  annual  conference  of  The 
American  Psychological  Association.  Los  Angeles. 

Vineberg.  R.  ft  Taylor.  E.  N.  (1972).  Performance  In  four  army  jobs  by  men  at 
different  aptitude  (AFQT)  levels;  4.  Relationships  between  performance 
criteria  n^hnlcai  Report  72-Z3.  pp.  17.  19).  AlLcandrla.  vL  Human 
Resources  Research  Organization. 

Vineberg.  R. .  ft  Taylor.  E.  N..  (1978).  Alternatives  to  perfonnyce  testing: 
Tests  of  task  knowledge  and  ratings  (Professional  Paper  b-78). 
Alexandria.  VA:  Human  Resources  Research  Organization. 


19 


THE  DEVELOPMENT  OF  A  MODEL  OF  THE 


PROJECT  A  CRITERION  SPACE 


John  P.  Campbell 

Human  Resources  Research  Organization 

Lawrence  M.  Hanser 
U.S.  Army  Research  Institute 

Lauress  Wise 

American  Institutes  for  Research 


Presented  on  Symposium, 

"Project  A  Concurrent  Validation:  Preliminary  Results" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


21 


THE  DEVELOPMENT  OF  A  MODEL  OF  THE 
PROJECT  A  CRITERION  SPACE^ 


John  P.  Campbell  Lawrence  M.  Hanser 

University  of  Minnesota  Am^  Research  Institute 

Lauress  Wise 

American  Institutes  for  Research 
Conceptual  Background 

The  goals  of  performance  measurement  In  Project  A  are  to  define,  or 
model,  the  total  domain  of  performance  in  some  reasonable  way  and  then  develop 
reliable  and  valid  measures  of  each  major  factor.  The  performance  measures 
are  to  serve  as  criteria  for  validating  selection/classification  tests,  and 
not,  at  this  point,  as  operational  appraisals. 

Some  additional  specific  goals  are  to:  a)  make  a  state-of-the-art 
attempt  to  develop  job  sample  or  "hands-on"  measures  of  job  task  proficiency, 

b)  compare  hands-on  measurement  to  paper-and-penci 1  tests  and  rating  measures 
of  proficiency  on  the  same  tasks  (I.e.,  a  multi-trait,  multi-method  approach), 

c)  develop  standardized  measures  of  training  achievement  for  the  purpose  of 
determining  the  relationship  between  training  performance  and  job  performance, 
and  d)  evaluate  existing  archival  and  administrative  records  as  possible 
indicators  of  job  performance. 

Given  these  Intentions,  the  criterion  development  effort  focused  on  three 
major  methods:  hands-on  job  sample  tests,  multiple  choice  knowledge  tests, 
and  ratings.  The  behavlorally  anchored  rating  scale  (BARS)  procedure  was 
extensively  used  in  the  development  of  the  rating  methods. 

Modeling  Performance 

The  development  efforts  to  be  described  were  guided  by  a  particular 
"theory"  of  performance.  The  basic  outline  is  as  follows. 

First,  job  performance  really  Is  multi-dimensional .  There  is  not  one 
outcome,  one  factor,  or  one  anything  that  can  be  pointed  to  and  labeled  as  job 
performance.  It  is  manifested  by  a  wide  variety  of  behaviors,  or  things 
people  do,  that  are  judged  to  be  important  for  accomplishing  the  goals  of  the 
organization  (Army). 

Two  General  Factors 


For  the  population  of  entry  level  enlisted  positions  we  postulated  that 
there  are  two  major  types  of  job  performance  components.  The  first  are 
specific  to  a  particular  job.  That  Is,  measures  of  such  components  would 
reflect  specific  technical  competence  or  specific  job  behaviors  that  are  not 


^This  research  was  funded  by  the  U.  S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences,  Contract  No.  MDA903-82-C-0531.  All  statements 
expressed  in  this  paper  are  those  of  the  authors  and  do  not  necessarily 
express  the  official  opinions  or  policies  of  the  U,  S.  Arny  Research  Institute 
or  the  Department  of  the  Amy. 


22 


required  for  other  jobs.  We  anticipated  that  there  would  be  a  relatively 
small  number  of  distinguishable  factors  of  technical  performance  that  would  be 
a  function  of  different  abilities  or  skills. 

The  second  kind  of  performance  factors  include  components  that  are 
defined  and  measured  in  the  same  way  for  every  job.  These  are  referred  to  as 
Army-wide  criterion  factors  and  incorporate  the  basic  notion  that  total 
performance  is  much  more  than  task  or  technical  proficiency.  It  might  include 
such  things  as  contributions  to  teamwork,  continual  self-development,  support 
for  the  norms  and  customs  of  the  organization,  and  perseverance  in  the  face  of 
adversity. 

Factors  vs.  a  Composite 

Saying  that  performance  is  multi -dimensional  does  not  preclude  using  just 
one  index  of  an  individual's  contributions  to  make  a  specific  personnel 
decision  (e.g.,  select/not  select,  promote/not  promote).  As  argued  by  Schmidt 
and  Kaplan  (1971)  some  years  ago,  it  seems  quite  reasonable  for  the 
organization  to  scale  the  importance  of  each  major  performance  factor  relative 
to  a  particular  personnel  decision  that  must  be  made  and  to  combine  the 
weighted  factor  scores  into  a  composite  that  represents  the  total  contribution 
or  utility  of  an  individual's  performance,  within  the  context  of  that 
decision. 

A  Structural  Model 


If  performance  is  characterized  in  the  above  manner,  then  a  more  formal 
way  to  model  performance  is  to  think  in  terms  of  its  latent  structure, 
postulate  what  that  might  be,  and  then  resort  to  a  confirmatory  analysis. 
Within  limits,  this  is  what  we  tried  to  do.  Unfortunately,  it  is  true  that  we 
simply  know  a  lot  more  about  predictor  constructs  than  we  do  about  job 
performance  constructs.  There  are  volumes  of  research  on  the  former,  and 
alirost  none  on  the  latter. 

Unit  vs.  Individual  Performance 


Finally,  people  do  not  usually  work  alone.  Individuals  are  members  of 
work  groups  or  units  and  it  is  the  unit's  performance  that  frequently  is  the 
most  central  concern.  Project  A  has  not  incorporated  unit  effectiveness  in 
its  model  of  performance.  The  project  is  focused  on  the  development  of  a  new 
selection/classification  system  for  entry  level  personnel  and  is  concerned 
with  improving  personnel  decisions  about  individuals  and  not  units.  The  task 
is  to  maximize  the  average  payoff  per  individual  selected. 

What  we  have  chosen  to  do  is  to  try  to  identify  the  factors,  or  means,  by 
which  individuals  contribute  to  unit  performance  and  to  assess  individual 
performance  on  those  factors  via  rating  methods. 

Criterion  Development 

Actual  criterion  development  proceeded  from  two  basic  types  of 
information.  First,  all  available  task  descriptions  were  used  to  generate  a 
population  of  job  tasks  for  each  MOS.  The  principal  sources  of  task 


23 


description  are  the  Army's  periodic  job  description  surveys  and  the  Soldier's 
Manual  for  each  MOS  which  is  a  specification  by  management  of  what  the  task 
content  of  the  job  is  supposed  to  be.  After  much  editing,  revising  to  insure 
non  redundancy  and  a  uniform  level  of  generality,  and  a  formal  review  by  a 
panel  of  subject  matter  experts,  a  population  of  130-180  tasks  was  enumerated 
for  each  MOS. 

An  additional  series  of  expert  judgments  was  then  used  to  scale  the 
relative  difficulty  and  importance  of  each  task  and  to  cluster  tasks  on  the 
basis  of  content  similarity.  Sampling  tasks  for  measurement  was  accomplished 
via  a  kind  of  Delphi  procedure.  That  is,  each  member  of  a  team  of  task 
selectors  was  asked  to  select  30  tasks  from  the  population  of  tasks  such  that 
those  selected  were  representative  of  task  content,  were  important,  and 
represented  a  range  of  difficulty.  The  individual  judge's  choices  were  then 
regressed  on  the  task  characteristics  and  both  the  choices  and  the  captured 
"policy"  of  each  person  were  fed  back  to  the  group  members,  who  each  revised 
their  choices  as  they  saw  fit.  The  consensus  of  the  task  selection  panel  was 
then  thoroughly  reviewed  by  the  Army  command  responsible  for  that  particular 
job.  This  last  review  was  the  "final"  word  on  the  representativeness  of  task 
samples  and  produced  a  sample  of  30  tasks  for  each  job. 

Standardized  job  samples,  the  paper-and-penci 1  job  knowledge  tests,  and 
numerical  ratings  scales  were  then  constructed  to  assess  knowledge  and 
proficiency  on  these  tasks.  Each  measure  went  through  multiple  rounds  of 
pilot  testing  and  revision.  The  job  sample  tests  were  fairly  elaborate  and 
were  composed  of  multiple  test  stations  sometimes  spread  over  a  football  field 
size  area.  Because  of  time  limitations  (4  hours),  only  15  of  the  tasks  could 
be  tested  hands-on. 

The  second  procedure  used  to  describe  job  content  was  the  critical 
incident  method.  Panels  of  NCO's  and  officers  generated  thousands  of  critical 
incidents  of  effective  and  ineffective  performance.  There  were  two  basic 
formats  for  the  critical  incident  workshops.  One  asked  participants  to 
generate  incidents  that  potentially  could  occur  in  any  job.  The  second  type 
focused  on  incidents  that  were  specific  to  the  content  of  the  particular  job 
under  consideration.  The  behaviorally  anchored  rating  scale  procedure  was 
used  to  construct  rating  scales  for  performance  factors  specific  to  a 
particular  job  (MOS-specific  BARS)  and  performance  factors  that  were  defined 
in  the  same  way  and  relevant  for  all  jobs  (Arny-wide  BARS).  The  critical 
incident  procedure  was  also  used  with  workshops  of  combat  veterans  to  develop 
rating  scales  of  "expected"  combat  effectiveness. 

Since  one  major  objective  was  to  determine  the  relationships  between 
training  performance  and  job  performance  and  their  differential 
predictability,  if  any,  a  comprehensive  training  achievement  test  was 
constructed  for  each  MOS  by  carefully  matching  the  content  of  the  program  of 
instruction  (POI)  with  the  content  of  the  population  of  job  tasks,  and  writing 
items  to  represent  each  segment  of  the  match. 

The  final  entry  in  the  array  of  criterion  measures  was  produced  by  a 
concerted  effort  to  get  what  we  could  from  the  files  or  archival  records.  We 
began  by  enumerating  all  possibilities  from  three  major  sources  of  such 
records:  the  enlisted  master  file,  the  enlisted  military  personnel  file,  and 
the  military  personnel  records  jacket  (the  201  File). 


24 


We  systematically  compared  these  three  sources  using  a  sample  of  750 
people  and  a  standardized  information  recording  form.  The  201  file  looked  the 
most  promising  In  terms  of  recency  and  completeness,  but  of  course,  it  is  by 
far  the  most  expensive  to  search.  As  a  consequence,  we  collected  eight 
archival  performance  indicators  via  a  self  report  questionnaire.  That  is, 
people  were  asked  what  was  in  their  personnel  file  as  regards  letters  of 
commendation,  disciplinary  actions,  etc.  Field  tests  on  a  sample  of  500 
people  showed  considerable  agreement  between  self  report  and  archival  records, 
for  both  positive  and  negative  things.  Further  follow-up  questionnaires  and 
interviews  suggested  that  self  report  may  be  the  more  accurate.  The  self 
report  items  were  combined  into  five  indicators  that  were  actually  used  as 
criterion  measures. 


Determining  Actual  Criterion  Scores 

The  first  step  in  our  analyses  was  to  identify  the  basic  criterion  scores 
whose  structure  we  would  analyze.  If  all  the  rating  scales  are  used 
separately  and  the  MOS-specific  measures  are  aggregated  at  the  task  or 
instructional  module  level,  there  are  approximately  200  criterion  scores  on 
each  individual.  Some  aggregation  was  needed. 

Reduction  of  the  Hands-On  and  Written  Variables 


The  30  tasks  sampled  for  each  job  were  clustered  via  expert  judgment  into 
8  to  15  functional  categories  on  the  basis  of  similarity  of  task  content. 

Each  of  the  school  knowledge  items  was  similarly  mapped  into  a  specific 
functional  category. 

Ten  of  the  functional  categories  were  coimon  to  some  or  all  of  the  jobs 
(e.g.,  first  aid.  basic  weapons,  field  techniques).  Each  job  also  had  two  to 
five  functional  performance  categories  that  were  unique. 

After  category  scores  were  computed,  separate  factor  analyses  were 
executed  for  each  type  of  measure  within  each  job.  There  were  several  common 
features  in  the  resuUs.  First,  the  unique  functional  categories  for  each  job 
tended  to  load  on  different  factors  than  the  common  functional  categories. 
Second,  the  factors  that  emerged  from  the  common  functional  categories  tended 
to  be  fairly  similar  across  the  nine  different  jobs  and  across  the  three 
methods. 

Using  the  empirical  factor  analysis  to  guide  us,  we  adopted  a  set  of 
content  categories  which  became  the  performance  test  scores  used  in  subseuqent 
analyses. 

Reduction  of  the  Rating  Variables 

The  individual  rating  scales  were,  for  the  most  part,  highly  reliable. 
Empirical  factor  analyses  of  the  Army-wide  rating  scales  suggested  three 
factors.  These  were: 

1.  Effort /Leadership,  including  effort  and  competence  in  performing 
job  tasks,  leadership,  and  self-development. 

2.  Maintaining  Personal  Discipline,  including  self-control, 
integrity,  and  following  regulations. 


25 


3,  Physical  Fitness  and  Military  Bearing,  including  physical 

fitness  and  maintaining  proper  Military  bearing  and  appearance. 

Similar  factor  analyses  were  reviewed  for  the  job-specific  scales  for 
each  job.  Two  factors  were  identified  based  on  these  results.  The  first 
consisted  of  those  aspects  of  job  performance  that  were  central  to  the 
specific  technical  content  of  each  job.  The  second  factor  included  the 
remaining,  less  central  job  performance  components. 

The  individual  items  in  the  combat  performance  prediction  battery  also 
were  subjected  to  an  empirical  factor  analysis.  Two  factors  emerged.  The 
first  factor  consisted  of  items  depicting  exemplary  effort,  skill,  or  courage 
under  stressful  conditions.  The  second  factor  consisted  of  negatively  worded 
items  portraying  failure  to  follow  instructions  and  lack  of  discipline  under 
stressful  conditions. 


Building  the  Target  Model 

The  next  step  was  to  build  a  target  model  of  job  performance  that  could 
be  tested  for  goodness  of  fit  within  each  of  our  nine  jobs.  The  project  began 
with  an  initial  model  of  performance  (Borman,  Notowidlo,  Rose,  ft  Hanser,  in 
press)  which  had  been  modified  on  the  basis  of  field  test  data  (Campbell  ft 
Harris,  1985).  Principal  components  factor  analyses  within  NOS  were  used  to 
suggest  further  modifications. 

Several  consistent  results  were  observed.  First,  the  expected  "method" 
factors  appeared,  specifically  one  factor  for  the  ratings  and  one  for  the 
written  tests.  The  evidence  for  a  "hands-on"  method  factor  was  less 
compelling.  Second,  the  nature  of  the  substantive  factors  tended  to  be 
similar  across  NOS. 

Based  on  the  empirical  analyses,  a  revised  model  was  constructed  to 
account  for  the  correlations  among  our  performance  measures.  This  model 
included  five  job  performance  constructs  and  two  measurement  method  factors. 

Confirming  the  Model  Within  Each  Job 

The  next  step  in  the  analysis  was  to  conduct  separate  tests  of  goodness 
of  fit  of  this  target  model  within  each  of  the  nine  jobs.  This  was  done  using 
the  LISREL  confirmatory  factor  analysis  program  (Joreskog  ft  Sorbom,  1981). 

In  conducting  a  confirmatory  factor  analysis  with  LISREL,  it  is  necessary 
to  specify  the  structure  of  three  different  parameters  matrices:  the 
hypothesized  factor  structure  matrix  (a  matrix  of  regression  coefficients  for 
predicting  the  observed  variables  from  the  underlying  latent  constructs);  the 
matrix  of  uniqueness  of  error  components  (and  intercorrelations);  and  a  matrix 
of  covariance  among  the  factors.  In  these  analyses,  we  set  the  diagonal 
elements  of  the  covariance  matrix  to  one,  forcing  a  "standardized"  solution. 
This  meant  that  the  off-diagonal  elements  would  represent  the  correlations 
among  and  between  our  performance  constructs  and  method  factors.  We  further 
specified  that  the  correlation  among  the  two  method  factors  and  each 
performance  construct  should  be  zero.  This  effectively  defined  the  method 
factor  as  that  portion  of  the  comnon  variance  among  measures  from  the  same 
method  that  was  not  predictable  from  (i.e.,  correlated  with)  any  of  the  other 
related  factor  or  performance  construct  scores. 


2C 


To  be  perfectly  dear,  the  approach  we  used  was  obviously  not  purely  confirmatory, 
the  hgypothesized  target  model  was  based  in  part  on  artaiyses  of  these  same  data. 

Confirmation  of  the  Overall  Model 

Given  the  certain  amount  of  prior  examination  of  the  data  described  above,  the 
results  of  the  confirmatory  procedures  applied  to  each  job  seemed  to  support  a 
common  structure  of  job  performance.  The  procedures  also  yielded  reasonably 
similar  estimates  of  the  intercorrelations  anxing  the  constructs  an  of  the  loading  s  of 
the  observed  variables  on  these  constructs  across  the  none  jobs. 

The  final  step  in  our  analyses  was  to  determine  whether  the  variation  in  some  of 
these  parameters  across  jobs  could  be  attributed  to  sampling  variation.  The  specific 
model  that  we  explored  stated  that:  (1)  the  correlation  among  factors  was  invariant 
across  jobs  and  (2)  the  loadings  of  all  of  the  Army-wide  measures  on  the  performance 
constructs  and  on  the  rating  method  factor  were  also  constant  across  jobs. 

The  overall  model  fit  extremely  well.  The  root  mean  square  residual  was  .047,  and 
the  chi-square  was  2508.1 .  There  were  2403  degrees  of  freedom  after  adjusting  for 
missing  variables  and  the  use  of  the  data  in  estimating  uniqueness.  This  yields  a 
significance  level  of  .07,  not  enough  to  reject  the  model. 

Summary  and  Dlacuasion 

Some  aspects  of  the  final  structure  are  noteworthy.  Rrst.  in  spite  of  some 
confounding  with  measurement  method,  the  latent  performance  structure  appears  to 
be  composed  of  very  distinct  components.  It  is  reasonable  to  expect  that  the  different 
performance  constructs  should  be  weighted  in  forming  an  overall  appraisal  of 
performance  for  use  in  personnel  decisions.  Using  regression  techniques  to  partial 
the  methods  factors  from  the  substantive  factors  should  also  tell  us  more  about  what 
does  or  does  not  predict  the  residual  variance. 

Finally,  since  (a)  the  five-factor  solution  is  stable  across  jobs  sampled  from  this 
population,  (b)  the  performance  constructs  seem  to  make  sense,  and  (c)  the 
constructs  are  based  on  measures  carefully  developed  to  be  content  valid,  it  seems 
safe  to  ascribe  some  degree  of  construct  validity  to  them. 

References 


Borman,  W.  C.,  Motowidlo,  S.J..  Rose,  S.  R.,  &  Hanser.  L  M.  0^  press). 
Development  of  a  model  of  soldier  effectiveness  (Technical  Report  Z4D- 
Alexandria,  VA:  U.S.  Army  Research  Institute. 

Campbell.  J.  P..  &  Harris,  J.  H.  (1985).  Criterion  reduction  and  combination  via  a 
participation  decision-making  panel.  Paper  presented  at  the  93rd  Annual  Meeting 
of  the  Annerican  Psychological  Association,  Los  Angeles. 

Joreskog,  K.  C.,  &  Sorbom,  D.  (1981).  USRELVI:  Analysis  of  Linear  Squares 
methods.  Uppsala,  Sweden;  University  of  Uppsala. 

Schmidt,  F.  L,  &  Kaplan,  L  B.  (1977).  Composite  vs.  multiple  criteria;  A  review  and 
resolution  of  the  controversy.  Personnel  PswcholQQv.  2A,  419-434. 


27 


Table  1 


SMBHry  of  Crltarlw  NtMiirM  Um4  tii  latch  A 
mi  Mcch  Z  Cmcwtotc  Valldatloa  Sa^lcc 


•  Joh  SMpU  (Hands-On)  tnstt  of  NOS-spocIfIc  task  profleloncy. 

-  Individual  It  tostod  on  oach  of  IS  aajor  Job  tasks  In 
an  MOS. 


0  Papor-and-pone11  Job  knowlodpo  tosts  datlpnod  to  noasoro 
tatk-spoelfle  Job  knowlodpo. 

•  Individual  It  scored  on  ISO  to  200  nultiplo-eholct 
Itons  roprosontlnp  30  aajor  Job  tasks.  Ton  to  IS  of 
tbo  tasks  uoro  also  aoasurod  hands-on. 

o  latlnp  scald  noasurts  of  specific  task  porforaance  on  the  IS 
tasks  also  aoasurod  with  tho  knoulodpo  tosts.  Host  of  tho  ratad 
tasks  wort  also  Includod  In  tho  hands-on  aoasuros. 

0  NOS-spocIfIc  bohavloralljr  anchorod  rating  scales  (BARS).  Froa  6 
to  10  BARS  wort  develops  for  oach  POS  to  roprosont  the  aajor 
factors  that  censtitwto  Job-spocific  technical  and  task 
proficiency. 


Porfevaaoco  Hoaswros  for  latch  Z  Only 

0  Aray-uido  Rating  Scales  (all  obtained  froa  both  supervisors  and 
peers). 

•  Racings  of  porforaance  on  11  conaon  tasks  (e.g..  basic 
first  aid). 

•  Single  scale  racing  on  porforaance  of  specific  Job 
duties. 

Auslllarv  Heoswres  Included  In  Criterion  Batterr 

a  A  Job  History  Questionnaire  which  asks  for  Inforaatlon  about 
freguoncy  and  recency  of  porforaance  of  the  N0S-spec1f1c  tasks. 

e  Mork  Questionnaire  •  a  44-itea  questionnaire  scored  on  14 
dlaenslons  descriptive  of  the  Job  envlronaent. 

0  Heasureaont  Method  Ratlng^  obtained  froa  all  participants  at  the 
end  of  the  final  testing  session. 


28 


Table  2 


Six  basic  functional  categories  of  Job  performance  and 
knowledge  obtained  from  factor  analyses  of  hands*on  Job 
sample  tests  and  paper* and* pencil  knowledge  tests. 


1.  Basic  Soldiering  Skills  (field  techniques,  weapons,  navigate, 
customs  and  laws). 

2.  Safety/Survival  (first  aid,  nuc1ear*b1o1og1ca1*chem1ca1 
safety). 

3.  Communications  (radio  operation). 

4.  Vehicle  Maintenance. 

5.  Identify  Friendly/Enemy  Aircraft  and  Vehicles. 

6.  Technical  Skills  (specific  to  the  Job). 


29 


Tabic  3 


Pcrforaancc  factors  rcprcscaciag  the  eoflooa  laccac  structure 
across  all  jobs  in  Project  A  saaple.  The  criterion 
■aasurcs  chat  coaprisc  each  factor  arc  as  inaicated. 


1)  Teak  Profieienert  MDS  (Job)  specific  core  techaieel  skills:  The  profi¬ 
ciency  vich  which  the  individual  perforas  the  tasks  whicn  art  "central'*  to 
his  or  her  job  (MOS).  The  tasks  represent  the  core  of  the  job  and  they  are 
its  primary  definers  from  job  to  job. 

’The  subscales  represent  lag  core  content  in  both  the  'enowlcdgc 
tests  and  the  job  sample  tests  that  loaded  on  this  factor  were 
sunned  within  method,  standardized,  and  then  added  togetner  for 
a  total  factor  score.  The  factor  score  does  not  include  any 
racing  aaasurcs. 

2)  Task  Profieienev;  General  or  eoMom  skillet  la  addition  to  the  core  tech¬ 
nical  content  specific  to  an  MOS.  tndivxduals  in  every  MOS  responsible  for 
being  able  to  perform  a  variety  of  general  or  conon  tasks  — e.g..  use  of 
basic  weapons,  first  aid.  etc..  This  factor  represents  proficiency  on  these 
general  casks. 

’The  same  procedure  (as  for  factor  one)  was  used  to  sum  the 
general  cask  scales,  standardized  within  methods,  and  add  the 
two  standardized  scores. 


3)  Peer  Leadership.  Effort,  and  Self  Pm—lopnenct  Reflects  the  degree  to  which 
Che  individual  exerts  effort  over  the  full  range  of  job  casks,  perseveres  under 
adverse  or  dangerous  conditions,  and  demonstrates  leadership  and  support  coward 
peers.  That  is,  can  the  individual  be  counted  on  to  carry  out  assigned  casks, 
even  under  adverse  conditions,  to  exercise  good  judgment,  and  to  be  generally 
dependable  and  proficient. 

’Five  scales  from  the  Army-wide  BARS  racing  form  (gen,  tech, 
performance,  peer  leadership,  demonstrated  effort,  self  develop¬ 
ment,  gen.  maintenance),  the  expected  combat  performance  scales, 
the  job  specific  BARS  scales,  and  the  total  auad)cr  of  coanenda- 
cions  and  awards  received  by  the  individual  were  summed  for  this 
factor. 


Maintaininm  Persooal  Discipline;  Reflects  the  degree  to  which  the  indivi¬ 
dual  adheres  to  Army  regulations  and  traditions,  exercises  personal  self  con¬ 
trol,  demonstrates  responsibility  in  day  to  day  behavior,  and  does  not  create 
disciplinary  problems. 

’  Scores  on  this  factor  are  composed  of  three  Army-wide  BARS 
scales  (adherence  to  traditions  and  regulations,  exercising 
self  control,  demonstrating  integrity),  a  subscale  from  the 
combat  rating  pertaining  to  avoidance  of  trouble,  and  two 
indices  from  the  administrative  records  (number  of  disci¬ 
plinary  actions  and  promotion  rate). 

5)  Physical  Fitness  and  Mllitarr  Bearing;  Represents  the  degree  to  which  the 
individual  maintains  an  appropriate  military  appearance  and  bearing  and  stays 
in  good  physical  condition. 

‘  Factor  scores  are  the  sum  of  the  physical  fitness  qualification 
score  from  the  individual's  personnel  record  and  the  "military 
bearing  and  appearance  "  racing  scale. 


30 


TABLE  4 


Measurement  Methods  Factors  in  Project  A  Job  Performance  Model 


1)  Written  Tyt  Method  :  That  portion  of  the  common  variance  among 
measures  from  the  paper-and-penc11  knowledge  tests  not  predictable  from 
(I.e.,  correlated  with)  any  of  the  other  related  factor  or  performance 
construct  scores. 

2)  Ratings  Method  :  That  portion  of  the  common  variance  among  measures 
from  the  rating  Instruments  not  predictable  from  (I.e.,  correlated  with) 
any  of  the  other  related  factor  or  performance  construct  scores. 


31 


PAHERNS  IN  SKILL  LEVEL  ONE  PERFORMANCE  IN 
REPRESENTATIVE  ARMY  JOBS: 

COMMON  AND  TECHNICAL  TASK  COMPARISONS 


Roy  C.  Canpbell 
Charlotte  H.  Campbell 
Earl  L.  Doyle 

Human  Resources  Research  Organization 


Presented  on  Symposium, 

"Job  Performance:  What  Do  Soldiers  Know,  What  Can  They  Do?" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


33 


Patterns  of  Skill  Level  One  Performance 
in  Representative  Army  Jobs: 

Cooraon  and  Technical  Task  Comparisons 

Roy  C.  Campbell 
Charlotte  H.  Campbell 
and 

Earl  L.  Doyle 

Human  Resources  Research  Organization 


In  the  project  for  Improving  the  Selection,  Classification  and 
Utilization  of  Army  Enlisted  Personnel,  commonly  known  as  Project  A,  nine 
jobs  or  military  occupational  specialties  (MOS)  were  covered  intensively  in 
the  concurrent  validation.  The  coverage  included,  among  other  measures, 
hands-on  tests  and  written  tests  based  on  task  samples  for  each  HOS.  The 
MOS,  along  with  the  number  tested  for  each  method,  are  shown  in  Table  1. 


Table  1 

MOS  and  Number  Tested 


MOS 

SLl  Title 

Written  N 

Hands-On  N 

IIB 

Infantryman 

678 

682 

13B 

Cannon  Crewman 

639 

619 

19E 

Armor  Crewman 

459 

474 

31C 

Single  Channel  Radio  Operator 

326 

341 

e3B 

Light  Wheel  Vehicle  Mechanic 

596 

569 

64C 

Motor  Transport  Operator 

668 

640 

71L 

Administrative  Specialist 

501 

494 

91A 

Medical  Specialist 

483 

496 

95B 

Military  Police 

665 

665 

Army  doctrine  specifies  that  all  skill  level  one  soldiers  are 
responsible  for  being  able  to  perform  all  tasks  in  their  MOS  skill  level  one 
Soldier's  Manual  (SM)  as  well  as  the  tasks  listed  in  the  skill  level  one 
Soldier's  Manual  of  Connon  Tasks  (SMCT).  This  latter  document  lists  those 
tasks,  known  as  Common  Tasks,  that  every  soldier,  regardless  of  job  or 
location,  must  be  able  to  perform  to  survive  in  a  hostile  combat  environment. 


This  research  was  funded  by  the  U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences,  Contract  No.  MDA 903-82 -C-0531.  All 
statements  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  express  the  official  opinions  or  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


34 


For  Project  A,  the  domain  definition  for  each  MOS  consisted  of  these  two 
types  of  tasks— those  that  were  included  because  they  were  dictated  by  the 
soldier's  job  (MOS-specific  or  Technical  tasks)  and  those  that  were  included 
because  Amy  doctrine  requires  all  soldiers  to  perfom  minimum  essential 
tasks  dictated  by  exposure  to  wartime  conditions  (Common  tasks).  During  the 
final  process  in  which  tasks  from  each  domain  were  selected  for  testing,  the 
process  vms  structured  so  that  the  selection  would  represent  the  full  range 
of  task  requirements  in  an  NOS.  Thus,  for  each  NOS,  the  tasks  tested  include 
both  Technical  and  Common  tasks  in  both  the  hands>on  and  written  components. 

To  be  sure,  the  distinction  between  Technical  and  Common  tasks  is 
sometimes  artificial.  The  skill  level  one  soldier  being  trained  probably 
does  not  discriminate  between  the  two  categories.  And  in  many  MOS,  such  as 
IIB,  95B,  and  91A,  there  is  little  actual  job  distinction  between 
MOS-specific  and  Cammon  tasks.  In  these,  and  in  some  other  MOS,  if  a  task 
did  not  already  exist  in  the  SHCT,  the  job  requirements  would  dictate  the 
task  be  included  as  an  MOS-specific  task. 

Yet  much  is  made  over  Common  Task  requirements.  The  specific  task 
concept  for  Common  tasks  began  emerging  in  1976  but  is  based  on  the  long 
established  Army  tradition  and  concept  that  all^  soldiers,  in  combat,  may  be 
called  upon  to  fulfill  certain  survival  functions.  The  complexity  of  the 
modern  battlefield  has  compounded,  not  diminished,  this  requirement.  SMCT 
tasks  receive  as  much  attention  and  revision  emphasis  by  TRADOC  as  do  any  of 
the  MOS-specific  technical  tasks.  Units  are  required  to  test  selected  common 
tasks  annually.  Army  Training  and  Evaluation  (ARTEP)  and  field  exercises  for 
all  type  units  emphasize  combat  survival  along  with  unit  mission  performance. 
But  there  are  differences  in  emphasis  as  well.  The  IIB  Infantryman  literally 
lives  with  his  M16  rifle;  the  71L  Administrative  Specialist  may  only  draw 
his/her  M16  for  maintenance  and  quarterly  or  semi-annual  training.  Yet  by 
doctrine,  each  is  equally  responsible  for  certain  M16  tasks.  The  question 
then,  is  whether  there  are  distinctions  among  Army  jobs  in  the  performance  of 
Coninon  tasks  and  also  whether  there  are  significant  distinctions  between 
performance  on  Technical  tasks  and  Common  tasks.  Project  A,  with  the  test 
results  from  over  5000  soldiers,  provided  an  opportunity  to  examine  this 
issue. 


Method 


In  the  9  MOS,  a  total  of  290  individual  tasks  were  tested.  Table  2 
shows  a  breakout  of  these  tasks  by  Technical  and  Common  category  and  by  test 
component.  Almost  all  tasks  tested  in  the  hands-on  component  were  also 
tested  in  the  written  component;  however,  there  were  some  tasks  tested  by 
written  component  only. 

Table  2 

Distribution  of  Observations  by  Test  Component 


Hands-On 

Written 

Technical 

89 

158 

Common 

60 

123 

35 


The  first  analysis  considered  all  the  MOS  combined  (Table  3).  There  was 
only  a  slight  and  insignificant  difference  on  hands-on  results  between  the 
Common  and  Technical  domains— the  apparent  difference  being  accounted  for  by 
the  larger  variance  in  performance  in  the  Technical  tasks.  In  the  written 
tests,  however,  the  difference  in  performance  is  significant,  with  higher 
performance  levels  reflected  in  the  Connon  task  performance.  It  should  be 
noted  however,  that  this  difference  may  be  the  result  of  test  difficulty.  As 
yet.  no  overall  item  analysis  of  the  written  tests  has  been  performed  to 
identify  difficulty  patterns. 

Table  3 

Comparison  of  Technical  and  Conmon  Task  Performance  on  Hands-On 
and  Written  Tests  For  Nine  MOS  Combined 


Test  Component 

Tasks 

Hands-On 

Written 

Technical  N  of  Tasks 

89 

158 

Mean  % 

68.2 

57.6 

S.D. 

19.2 

12.8 

Common  N  of  Tasks 

60 

123 

Mean  % 

73.3 

63.7 

S.D. 

15.1 

12.9 

Test  of  Difference  Between 

t  *  1.721 

t  »  3.948 

Conmon  and  Technical 

p  <  .09 

p  <  .001 

Although  the  nine  MOS  were  carefully  selected  to  represent  the  entire 
domain  of  Army  jobs,  the  Technical /Common  tasks  analysis  continued  by  looking 
at  the  nine  MOS  broken  down  into  families.  These  family  classifications 
followed  the  groupings  developed  by  McLaughlin.  Rossmeissl.  Wise,  Brandt,  and 
Wang  (1984).  Three  families  are  represented;  Family  I  is  Combat  (IIB,  13B, 
19E),  Family  II  is  Operations  (31C,  63B,  64C),  and  Family  III  is 
Skilled/Technical  (71L,  91A,  95B).  (The  71L  MOS  actually  belongs  in  a  fourth 
job  family— Clerical— but  we  have  grouped  it  with  the  Skilled/Technical  MOS 
for  the  analyses  reported  here.) 

Table  4  shows  the  results  by  this  family  breakout.  For  the  written 
tests  there  are  no  significant  differences  in  performance  between  families, 
that  is,  where  family  membership  affects  outcome.  In  the  hands-on  tests, 
however,  there  appears  to  be  a  significant  difference  by  family— Families  I, 
II  and  III  being  each  separated  by  about  5  points  in  performance.  Closer 
examination  however  reveals  that  much  of  this  difference  by  family  1'  due  to 
interaction  between  Conmon  and  Technical  tasks  within  the  family.  Common 
task  performance  across  families  is  quite  consistent.  The  difference  between 
families  is  accounted  for  almost  solely  by  the  Technical  tasks,  with  17 
points  difference  in  mean  performance  between  the  two  most  separated 
families. 


36 


Table  4 

Performance  Results  Based  on  Family  Membership 


Job  Family 

III  -  Skilled/ 

I  -  Combat  II 

-  Operations 

Technical 

Hands-On  Coraoonent  Tasks 

Technical 

N  of  Tasks 

36 

24 

29 

Mean  % 

61.4 

78.4 

68.3 

S.D. 

23.3 

12.0 

14.7 

Common 

N  of  Tasks 

19 

22 

19 

Mean  % 

72.8 

73.7 

73.3 

S.D. 

16.9 

15.8 

12.9 

Written  Component  Tasks 

Technical 

N  of  Tasks 

63 

49 

46 

Mean  % 

55.1 

58.1 

60.4 

S.D. 

13.0 

11.2 

13.0 

Common 

N  of  Tasks 

44 

39 

40 

Mean  % 

63.0 

63.0 

64.2 

S.D. 

12.9 

13.7 

12.3 

Analysis  of  Variance:  Job 

Family  x  Technical/Conmon 

Hands-On  Component 

Source 

SS 

df  MS 

P  P 

Job  Family 

1910.16 

2  955.08 

3.28  .04 

Technical/Coninon 

543.78 

1  543.78 

1.86  .17 

Family  x 

Technical/Conmon  1561.08 

2  780.54 

2.68  .07 

Error 

41694.06 

143  291.57 

Written  Component 

Source 

SS 

df 

MS 

F 

_L- 

Job  Family 

508.83 

2 

254.41 

1.55 

.21 

Technical /Common 

2316.63 

1 

2316.63 

14.14 

.00 

Family  x  Technical/Conmon 

205.18 

2 

102.59 

.63 

NS 

Error 

45062.95 

275 

163.86 

37 


Within  families  however,  there  is  always  a  significant  difference 
between  Technical  task  performance  and  Common  task  performance.  However, 
this  performance  difference  is  not  entirely  consistent— in  Families  I  and 
III,  Connon  task  performance  is  better  than  Technical  performance.  In  Family 
II,  the  opposite  is  true  for  the  hands-on  test  although  the  trend  shown  in 
Families  I  and  III  holds  true  for  the  written  tests. 


Conclusions 


For  a  variety  of  reasons,  relative  differences  in  performance  between 
Army  jobs  were  expected.  These  differences  can  be  variously  attributed  to 
innate  task  difficulty,  assignments,  training  emphasis  and  even  entrance 
requirements  into  the  MOS.  However  it  would  appear  that  the  Army  policy 
regarding  Connon  task  proficiency  appears  to  be  working.  While  differences 
in  performances  between  groups  of  MOS  showed  up  as  expected,  these 
differences  were  almost  entirely  attributable  to  technical  tasks  within  each 
group.  Connon  task  performance  is  renarkably  uniform  between  Family  groups. 
Based  on  Project  A  results  it  would  appear  the  Army  Conmon  Task  Management 
has  produced  its  desired  results. 


References 


McLaughlin,  D.  H.,  Rossmeissl,  P.  G.,  Wise,  L.  L.,  Brandt,  D.  A.,  &  Wang,  M. 
U984).  Validation  of  current  and  alternative  ASVAB  area  composites, 
based  ontraining  and  sQt  information  on_FY1981  and  FY19BZ  enlisted 
accessions  (ARI  Technical  Report  651).  Alexandria,  VA;  U.S~.  Army 
Research  Institute  for  the  Behavioral  and  Social  Sciences. 


EFFECTS  OF  TEST  PROGRAMS  ON  TASK  PROFICIENCY 


Patrick  Ford 
R.  Gene  Hoffman 

Human  Resources  Research  Organization 


Presented  on  Symposium, 

"Job  Performance:  What  Do  Soldiers  Know,  What  Can  They  Do?" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


39 


Effects  of  Test  Programs  on  Task  Proficiency 

Patrick  Ford  and  R-  Gene  Hofftnan 
Human  Resources  Research  Organization 


The  general  purpose  of  Project  A  is  to  predict  job  performance  by 
establishing  the  relationship  between  entry  measures  and  performance  on  a 
sample  of  job  tasks  in  nine  selected  MOS  (Eaton,  Goer,  Harris,  &  Zook,  1984). 
At  a  conceptual  level  the  relation  between  applicants'  ability  end  the  tasks 
on  a  job  ought  to  be  stable  so  long  as  the  job  does  not  change.  In  practice, 
however,  there  are  several  mediators  between  ability  and  performance.  Among 
the  potential  mediators  are  test  programs  that  focus  individual  training  in 
units.  In  these  programs  a  central  agency  establishes  a  set  of  tasks  that 
are  to  be  tested  and,  prestmably,  trained  in  units.  Data  collected  for 
Project  A  during  June  to  November  1985  provide  an  opportunity  to  look  at  the 
effect  of  these  programs  on  soldier  performance. 

This  paper  considers  three  programs  that  may  affect  task  proficiency: 

•  Cc*:7Don  Task  Test  (CTT).  This  is  a  hands-on  test  that  all 
soldiers  are  to  take  each  year.  The  Training  and  Doctrine 
Command  selects  a  subset  of  tasks  from  the  skill  level  one 
Soldier's  Manual  of  Coninon  Task^  During  the  Project  A 
data  collection  tne  operative  CTT  had  19  tasks.  Across  the 
nine  MOS,  the  test  samples  for  Project  A  included  14  of  them. 

For  comparison,  25  other  non-CTT  conraon  tasks  were  also  in 
the  Project  A  data  base. 

•  Expert  Infantry  Badge  (EIB).  This  is  a  hands-on  test  that 
is  administered  to  eligible  infantrymen  (MOS  IIB).  During 
the  data  collection  it  included  21  tasks  of  which  8  were 
included  in  the  Project  A  IIB  sample.  The  IIB  test  battery 
included  21  other  tasks. 

•  Expert  Field  Medical  Badge  (EFMB).  This  is  a  written  and 
hands-on  test  that  is  administered  to  medical  specialists 
(NOS  91A}.  During  the  data  collection  the  hands-on  section 
included  32  tasks  of  which  10  were  included  in  the  Project  A 
91A  sample.  The  91A  test  battery  included  20  other  tasks. 

The  criterion  measures  for  looking  at  the  effect  of  the  test  programs 
are  results  from  tests  administered  as  part  of  Project  A.  There  are  two 
types  of  criterion  measures: 

o  Hands-On  Tests  -  These  tests  were  based  on  direct  observation 
of  a  soldier's  performance  of  a  job  task.  The  tests  were 
developed  to  provide  consistent  conditions  for  performance. 

This  research  was  funded  by  the  U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences,  Contract  No.  MDA903-82-C-0531.  All 
statements  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  express  the  official  opinions  or  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Array. 


40 


Scores  were  percent  of  steps  performed  correctly  or,  in  some 
cases,  percent  of  oroduct  prepared  correctly.  Trsre  was  a 
sorarate  score  for  each  test. 

•  Written  Tests  -  These  tests  were  in  a  multiple-choice 
format.  Items  wer?  organized  Into  subtests  with  each 
subtest  corresponding  to  a  job  task.  The  score  was 
percent  correct  by  task. 

During  the  data  collection  (which  was  the  Project  A  Concurrent 
Validation),  the  tests  had  been  administered  to  over  5000  skill  level  one 
(SLl)  soldiers  in  nine  MOS.  The  MQS  covered  along  with  the  number  of 
soldiers  tested  for  each  method  are  shown  in  Table  1. 

Table  1.  .  . 

JOS  and  Number  Tested 


MC’S 

SLl  Title 

Written  N 

Hands-On  N 

IIB 

Infantryman 

678 

682 

13B 

Cannon  Crewman 

639 

619 

19E 

Armor  Cre\«»an 

459 

474 

31C 

Single  Channel  Radio  Operator  • 

326 

341 

63B 

Light  Wheel  Vehicle  Mechanic 

596 

569 

64C 

Motor  Transport  Operator 

668 

640 

71L 

Administrative  Specialist 

501 

494 

91A 

Medical  Specialist 

483 

496 

95B 

Military  Police 

665 

665 

Approach 

The  CTT  analyses  were  limited  to  the  SLl  common  tasks  (defined  as  tasks 
included  in  the  SLl  Soldier's  Manual  of  Ctxmon  Tasks).  Performance  on 
Project  A  tasks  that  were  also  on  the  CTT  was  compared  with  performance  on 
Project  A  SLl  coninon  tasks  that  were  not  on  the  CTT.  The  comparison  was  made 
on  two  levels— across  all  nine  MOS  and  by  MOS  family.  The  MOS  families  were 
based  on  previous  work  (Rossmeissl,  Wise,  Brandt,  &  Wang,  1984)  that 
identified  four  families:  combat  (IIB,  13B,  19E};  operations  (31C,  63B, 

64C);  clerical  (71L);  and  skilled  technical  (91A,  95B).  The  CTT  analyses 
combined  the  clerical  and  skilled  technical  families.  The  analyses  were 
conducted  separately  for  hands-on  and  written  criteria. 

The  Analysis  of  specific  MOS  programs  included  all  Project  A  tasks  for 
MOS  IIB  and  91A  respectively.  Two  comparisons  per  method  were  conducted  for 
each  program:  (1)  MOS  program  (EIB  or  ER-5B)  tasks  and  CTT  tasks  with  Project 
A  only  tasks  and  (2)  MOS  program  tasks  with  tasks  not  covered  by  the  MOS 
program  (including  CTT  tasks  that  were  not  in  EIB  or  EFHB,  respectively). 


41 


Results 


Th2  CTT  conoarisons  are  sunniarized  in  Table  2.  Whetiner  the  different?' 
are  stati'.tically  significant  depends  or.  the  orientation  of  the  interpreter. 
If  the  question  is  simply  "Does  performance  on  this  particular  set  of  CTT 
tests  differ  from  performance  on  this  particular  set  of  non-CTT  tests?" 
essentially  all  of  the  differences  %<ould  be  statistically  significant.  That 
is,  witn  test  scores  as  percents,  the  extremely  large  number  of  soldiers 
tested  yield  standard  errors  of  the  mean  for  most  tests  at  approximately  .9. 

A  more  conservative  standard  is  required,  however,  if  the  tests  are  treated 
as  samples  of  their  domain  and  the  pertinent  question  is  "Does  performance  on 
all  tasks  in  the  CTT  domain  differ  fran  performance  on  all  tasks  in  the 
non-CTT  domain?"  For  the  second  question,  the  N  is  number  of  tasks  sampled 
v/ithin  task  categories  (e.g.,  CTT/Non-CTT)  rather  than  soldiers. 

The  CTT  co.Tipari sons  were  analyzed  by  means  of  a  two  way  analysis  of 
variance  using  tasks  as  subjects,  with  program  membership  as  independent 
variables.  Following  the  conservative  interpretation  (N  as  number  of  tasks), 
none  of  the  differences  are  significant. 

Table  2 

Summary  of  Results  on  CTT  Tasks  and  Non-CTT  Cosnon  Tasks 


Not 


Test  Mode 

Task  Type 

Family 

Tasks 

Mean 

S.D. 

Hands-On 

cn 

All 

28 

76.73 

8.31 

(60  cases) 

Combat 

8 

79.67 

7.99 

Operations 
Skilled  Tech. 

11 

74.92 

7.48 

&  Clerical 

9 

76.01 

9.68 

Project  A 

All 

32 

70.40 

18.84 

Conrnon 

Combat 

11 

67.87 

20.16 

Operations 
Skilled  Tech. 

11 

72.49 

21.61 

&  Clerical 

10 

70.88 

15.43 

Written 

CTT 

All 

56 

65.77 

12.54 

(123  cases) 

Cambat 

18 

66.13 

13.04 

Operations 
Skilled  Tech. 

18 

67.93 

12.34 

&  Clerical 

20 

63.51 

12.54 

Project  A 

All 

67 

61.91 

13.01 

Cannon 

Combat 

26 

60.87 

12.64 

Operations 
Skilled  Tech. 

21 

60.37 

14.20 

&  Clerical 

20 

64.87 

12.32 

42 


Ths  EIB  cacserlsons  are  sunsarlzed  in  Table  3.  Here  bath  hands-on 
conaariscns  are  significant:  Special  program  (EIB  or  CTT)  with  no  special 
proorac  (F*S.0c2,  ?<.02);  and  EIB  with  non-EIB  (F*6.21.  P<.05).  iieither 
written  coc:a?r1son  approaches  significance. 


Table  2 

Simnary  of  Results  on  IIB  Special  Program  Tasics 


Test  Mode 

Task  Tyoe 

N  Of 
Tasks 

Mean 

S.O. 

Hands-On 

EIB  &  CTT 

8 

80.19 

12.70 

(13  cases) 

Project  A  Only 

5 

57.26 

16.50 

EIB 

.6 

79.70 

14.22 

Non-EIB 

7 

64.23 

18.49 

Written 

EIB  A  CTT 

13 

61.58 

9.51 

(28  cases) 

Project  A  Only 

15 

59.18 

12.37 

EIB 

8 

62.21 

9.80 

Non-EIB 

20 

60.03 

11.69 

The  EF7i6  camparisons  are  susnarized  in  Table  4.  None  of  the  differences 
are  significant. 

Table  4 


Summary  of  Results  on  91A  Special  Program  Tasics 


Test  Mode 

Task  tvoe 

RoT” 

Tasks 

Mean 

S.O. 

Kands-On 

EFMB  &  Cn 

6 

75.27 

6.25 

(16  cases) 

Project  A  Only 

10 

70.58 

12.49 

EFMB 

5 

76.43 

6.24 

Non-EFNB 

U 

70.49 

11.86 

Written 

EFMB  &  cn 

14 

68.72 

9.27 

(30  cases) 

Project  A  Only 

16 

65.66 

13.06 

EFMB 

11 

70.33 

9.40 

Non-EFNB 

19 

65.21 

12.21 

Discussion 


Our  reluctance  to  call  the  CTT  differences  significant  ought  not  to  be 
interpreted  to  mean  that  the  CTT  program  makes  no  difference.  All  it  means 
is  that  there  is  so  much  variation  among  the  hands-on  means  for  Project  A 
only  and  so  few  cases  overall  that  we  can  not  say  with  confidence  that 
hands-on  performance  on  any  set  of  tasks  selected  for  CTT  will  be  better  than 
performance  on  tasks  not  selected.  It  is  possible,  for  example,  that  some  of 
the  tasks  not  selected  are  more  complex  or  require  greater  coordination  than 


43 


tcsks  selected  for  CTT.  Besides  the  possible  sampling  error  among  tasks  we 
must  also  rer.erber  tnat  the  Project  A  results  are  a  snapshot  of  a  wice  range 
of  units  at  different  points  in  their  training  cycles.  Since  the  CTT  effect 
could  weaken  over  time,  any  evaluation  that  does  not  minimize  the  delay 
understates  the  effect. 

The  results  do  suggest  that  the  portion  of  the  CTT  captured  by  Project  A 
during  the  sirtter  of  1985  had  a  positive  association  with  hands-on  scores. 

It  is’somewhat  surprising  that  tne  difference  was  strongest  in  the  MGS  in  the 
Co.T.bat  family. 

Tne  EIo  appears  to  be  a  very  powerful  program  and  must  be  considered 
when  interpreting  criterion  data  on  ilB.  It  is  less  clear,  however,  that 
similar  programs  would  achieve  comparable  results  in  any  MOS.  Tre  impact  of 
the  EF”E  or.  SIA,  for  example  is  not  nearly  as  dramatic.  Among  tha  myriad  of 
explanations  for  t.he  difference,  two  seem  to  be  especially  appropriate. 

First,  tre  ITl'.l  has  not  had  time  to  develop  the  credibility  that  the  EIB  has. 
The  credibility  of  the  program  affects  the  number  of  people  who  are  tested 
and,  probably  more  important,  the  intensity  of  training  that  precedes  the 
testing.  Second,  there  may  be  a  ceiling  effect  for  91A.  Performance  of 
.medical  specialists  may  be  high  enough  without  the  program  that  any 
increment  is  small. 


Conclusion 


The  imcact  of  test  programs  on  soldier  performance  is  ambiguous.  No 
program  considered  in  this  paper  had  a  meaningful  effect  on  performance  as 
measured  by  written  tests.  The  1985  CTT  apparently  affected  hands-on  results 
in  the  Project  A  data  but  we  cannot  generalize  that  a  comparable  effect  will 
occur  every  year.  The  effect  was  to  equalize  performance  on  a  subset  of 
common  tasks  across  NOS  mainly  by  increasing  hands-on  performance  of  soldiers 
in  combat  iiOS.  The  EIB  program  had  a  strong  effect  on  hands-on  performance 
of  infantr^Tien  and  should  be  considered  as  a  moderator  of  IlB  performance. 
However  the  EFI<'3  program  for  medical  specialists,  though  parallel  to  the  EIB, 
did  not  have  a  comparable  effect. 


References 


Eaton,  N.  K. ,  Goer,  N.  H. ,  Harris,  J.  H.  and  Zook,  L.  K.  (Eds.).  (1984). 
Improving  the  selection,  classification,  and  utilization  of  army 
enlisted  perspnnel :  Annual  report,  FY1984  (ARl  Technical  Report  660). 
Alexandria,  VA:  U.^.  Army  Research  institute  for  the  Behavioral  and 
Social  Sciences. 

NcLaughlin.  D.  H. ,  Rossmeissi,  P.  G.,  Wise,  L.  L.,  Brandt,  D.  A. ,  A  Wang,  M. 
(1984).  Validation  of  current  and  alternative  ASVAB  area  composites, 
based  on  training  and  SOT  information  on  ^YI981  and  FY1982  enlisted 
accessions  CARl  Technical  Report  651).  Alexandria,  VA:  U.S.  Array 
Research  institute  for  the  Behavioral  and  Social  Sciences. 


44 


EFFECTS  OF  SOLDIER  PERFORMANCE  AND  CHARACTERISTICS  ON 
RELATIONSHIPS  WITH  SUPERIORS 


Ilene  F.  Oast 
Ltonard  A.  White 

U.S.  Anay  Research  Institute 


Presented  on  Session,  "Leadership" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Arny 
Research  Institute  or  the  Department  of  the  Army. 


45 


Effects  of  Soldier  Perfomnee  . 

and  Characteristics  on  Relationships  vith  Superiors* 

llene  F.  Cast  and  Leonard  A.  White 
U.S.  Amy  Research  Institute  for  the  Behavioral  and  Social  Sciences 

Vith  the  increasing  emphasis  on  interactive  leadership  approaches  (Jacobs, 
1971;  Graen  1976)  has  cone  a  recognition  of  the  contributions  subordinates 
SMke  to  the  leadership  process.  Although  leaders  aay  tend  to  have  a  charac¬ 
teristic  stylSf  they  vary  their  behavior  substantially  in  response  to  subor¬ 
dinate  actions  and  needs.  Graen  has  shown  that  leaders  fom  different  kinds 
of  working  relationships  vith  their  subordinates.  Relationships  range  from 
*in>group”  ones  characterised  by  autual  support  and  trust  to  ”out-group"  ones 
where  both  parties  do  only  idiat  is  required  ^  the  foraal  eaployaent  contract. 

Graen  (1976)  notes  that  relationships  formed  early  in  one's  career  have 
lasting  affects.  Based  on  a  longitudinal  investigation  of  management 
trainees,  Vakaybayashi  and  Graen  (1984)  conclude  that  a  newcomer's  relation¬ 
ship  with  his  or  her  superior  serves  motivating  and  mentoring  functions  that 
help  the  newcomer  to  assimilate  into  the  organization  and  to  gain  access  to 
information  and  resources  central  to  the  functioning  of  the  work  unit.  This 
experience  gives  newcomers  the  confidence  they  need  to  set  higher  performance 
goals.  Thus,  the  relationships  that  first  tour  soldiers  form  with  their 
superiors  are  not  only  important  from  the  standpoint  of  socialization  into  the 
Ar^,  but  may  also  affect  career  progression,  and  leadership  potential. 

Past  research  has  shown  that  subordinates'  performance  is  a  powerful 
deteminant  of  subsequent  treatment  by  superiors  <e.g. ,  Greene,  1975). 
Generally,  poor  performers  are  more  likely  to  have  low  quality  relationships 
vith  their  superiors.  Bowever,  because  this  phenomena  has  been  investigated 
primarily  in  the  laboratory,  field  research  is  needed. 

There  also  is  evidence  that  relatively  stable  personal  dispositions 
enable  some  subordinates  to  fom  more  positive  relationships  vith  superiors. 
Graen  and  his  associates  (Graen,  Novak,  A  Somaerkamp,  1982)  demonstrated  the 
importance  of  subordinates'  growth  need  strength  to  the  formation  of  effective 
relationships  vith  superiors.  However,  vith  the  exception  of  Green's  work  and 
research  by  Hough,  Gast,  White  and  HcCloy  (1986),  researchers  have  not 
adequately  adressed  the  potential  effects  of  individual  differences  on 
subordinates'  interactions  vith  their  superiors.  Such  research  is  needed. 

Using  data  from  Project  A:  Improving  the  Selection,  Classification,  and 
Utilisation  of  Amy  Enlisted  Personnel  (Eaton,  Goer,  Harris  A  Zook,  1984), 
this  paper  axamines  how  working  relationships  betveen  superiors  and 
subordinates  are  directly  affected  by  subordinates'  Job  performance, 
tasperaaent  and  ability.  In  addition,  this  paper  explores  non-linear  effects 
of  soldier  ability  and  temperament  on  working  relationships  vith  superiors. 

Method 

Subjects 


Subjects  were  5,123  first  tern  soldiers  in  9  military  occupational 
specialties  (NOS):  683  infantrymen  (IIB),  636  cannon  crew  members  (13B),  489 


Hhe  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  view  of  the  U.S.  Army  Research  Institute  of  the 
Department  of  the  Army* 


46 


tank  crav  Mabars  (19E),  349  radio  talatypa  operators  (31C),  618  light  irheel 
vehicle  aechanics  (63B),  670  aotor  transport  operators  (64C),  502  adainistra- 
tive  specialists  (71L)»  487  aedical  specialists  (91A)  and  689  ailitary  police 
(95B).  Within  the  saaple,  88Z  of  the  soldiers  were  aale  and  12Z  feaale.  Of 
those  irtio  reported  their  racial  origin,  23Z  vere  black,  3Z  were  hispanic,  70Z 
vere  vhite,  and  AX  replied  *other".  On  the  average,  soldiers  had  been  in  the 
Aray  for  18  aonths  and  vith  their  present  coapanies  for  about  a  year.  To 
facilitate  data  analysis,  jobs  vere  grouped  Into  four  occupational  clusters 
Identified  by  McLaughlin,  Itossneissl,  Wise,  Brandt  and  Vang  (1982).  The 
Coabat  cluster  Included  MOS  IIB,  13B  and  19E;  MOS  31C,  63B  and  64C  coaprised 
the  Operations  cluster;  MOS  71L  aade  up  the  Clerical  cluster  and  the  reaaining 
MOSf  91A  and  95B  coaprised  the  Skilled  Technical  cluster. 

Instmaents 


Supe^lsor  Behavior  Questionnaire.  The  authors  vrote  Iteas  to  tap  cate- 
gorles  of  supervisory  activities  i<lentified  through  analysis  of  400  behavioral 
exaaples  of  effective  and  Ineffective  leadership.  These  iteas  required 
subjects  rate  stateaents  about  their  supervisor  using  a  5>point  scale  from 
Very  Seldoa  or  Never  (1)  to  Very  Often  or  Always  (5).  The  resulting  question¬ 
naire  was  field  tested  In  a  saaple  of  696  first  tera  enlisted  (Vhite,  Cast,  & 
Munsey,  1985)  and  revised  prior  to  adainistration  In  the  present  saaple. 
Frlnclpal  factor  analysis  with  proaax  rotation  revealed  five  factors  vith 
eigenvalues  greater  than  one:  Inspiration/Support,  Participation,  Structuring 
Work,  Palmess/Oisclpllne,  Work  Allocation.  The  present  research  eaployed 
only  the  scales  corresponding  to  the  first  two  factors;  these  scales  vere  aost 
slellar  to  scales  SMasurlng  qualities  of  *in  group"  relationships  In  previous 
research  (Vecchio.  6  Gobdel,  1984;  Novak,  1985).  Typical  items  on  the  9-item 
Znsplrstion/Support  scale  Included  "Tour  supervisor  understands  your  problems 
end  needs"  and  "Tour  supervisor  vaatk  to  sake  you  give  your  best  effort".  The 
4-ltea  Participation  scale  contained  iteas  like  "Tou  are  peraitted  to  use  your 
evn  judgaent  in  solving  probleas".  Reliabilities  (Chronbach's  alpha)  for 
these  tvo  scales  were  .82  and  .70  respectively. 

Cyeral  cognitive  ability.  The  Araed  Services  Vocational  Aptitude  Battery 
(ASVAB)  vas  adainistered  to  all  subjects  prior  to  entering  ailitary  service. 

A  eoaposlte  of  four  ASVAB  subtests,  known  as  the  Araed  Forces  Qualification 
Test  (AFQT),  served  as  the  aeasure  of  general  cognitive  ability. 

Teaperaaent.  Bough,  Kaap  and  Barge  (1984)  developed  ten  scales  to  assess 
tenperaaent  constructs  shovn  to  be  related  to  criteria  of  work  perforaance  in 
previous  studies.  The  resulting  Inventory,  Assessaent  of  Background  and  Life 
Bxperlences  (ABLE),  vas  tested  on  470  soldiers  at  three  forts.  These  data 
guided  revisions  to  the  Iteas  and  scales.  When  subjected  to  principal  factor 
analysis  vith  varlaax  rotation,  the  revised  scales  yielded  three  factors  vith 
eigenvalues  greater  than  one:  Dependability,  Achieveaent  Orientation,  and 
Baotlonal  Stability.  Seales  aeasuring  self-esteen,  doainance,  energy  level, 
and  vork  orientation  coaprise  the  Achieveaent  Orientation  factor.  The 
Baotionsl  Stability  factor  assesses  the  degree  of  stability  vs.  rvisetivity  of 
oaotions.  The  Dependability  factor  Includes  aeasures  of  conscientiousness, 
non-delinquency,  support  for  rules  and  regulations,  and  respect  for  tradi¬ 
tional  values,  factor  scores  for  these  three  scales  used  in  the  analyses. 

Job  knovledge  tests.  Through  job  analysis  iaportant  knowledge  areas  were 
idwtified  for  Mch  NOS.  Project  A  personnel,  assisted  by  subject  natter  spe¬ 
cialists,  developed  iteas  to  tap  these  knowledges.  The  overall  job  knovledge 
test  score  vas  the  percentage  of  iteas  answered  correctly  by  each  soldier. 


47 


Bands-on  talk  proficitncy  fsts.  Critical  tasks  vere  identified  to 
raprasent  the  task  domain  for  each  HOS.  A  multiple  step  proficiency  test  ves 
developed  for  each  task,  and  each  step  vas  scored  pass  or  fail.  For  each 
task,  the  score  vas  the  proportion  of  steps  passed;  then  these  task  scores 
vere  averaged  to  yield  an  overall  hands-on  test  score  (Caapbell,  Caapbell, 
JIuesey  A  Edvards,  1985). 

Arey-vide  perfoi^nce  ratine  scales.  Eleven  7-point  behaviorally  anchored 
Taring  scales  vere  developed  to  assess  soldier  effectiveness  across  aray  jobs. 
These  scales  vent  beyond  task  perforaanee  to  Include  aspects  of  socialization 
and  eoaaitnent  to  the  organization.  Ten  scales  covered  specific  aspects  of 
soldier  effectiveness;  the  eleventh  scale  required  an  assessaent  of  overall 
effectiveness.  Supervisors*  ratings  on  this  eleventh  scale  vere  employed  in 
the  present  analyses  (Caapbell  et  al.,  1985). 

Procedure 

After  receiving  training  in  the  use  of  the  behavior  anchored  rating 
scales,  supervisors,  in  groups  of  3-15,  evaluated  their  subordinates.  The 
SMan  number  of  supervisors  providing  the  ratings  for  each  ratee  ranged  from 
1.66  to  1.83.  Eatings  vere  averaged  across  raters  to  fora  an  overall 
Aray-vide  effectiveness  rating  for  each  ratee.  Tests  of  Job  knowledge  and 
hands-on  task  proficiency  vere  also  adninistered  to  the  soldiers. 

Results  and  Discussion 

The  perforaanee  measures  (i.e.,  hands-on.  Job  knovledge  and  supervisory 
ratings)  vere  standardized  vithin  each  MOS  cluster.  Then,  noderated  regres¬ 
sion  techniques  vere  used  to  examine  determinants  of  leadership  vithin  each 
MOS  cluster.  The  ^noderating"  effect  of  one  independent  variable  on  another 
is  indicated  by  a  signlficut  increase  in  explained  variance  due  to  entry  of 
the  cross-product  tern  after  all  main  effects  have  been  entered  entered  into 
the  nodel.  Separate  models  vere  constructed  for  Supportive  and  for  Participa¬ 
tory  leadership.  Explanatory  variables  vere  entered  into  the  equations  in 
sets;  models  vere  tested  in  the  following  sequence:  (a)  main  effects  of 
Individual  difference  variables,  (b)  main  effects  of  performance  variables, 

<c)  all  main  effects,  (d)  all  main  effects  and  interactions  between  ability 
and  temperament  variables,  <e)  all  main  effects  and  interactions  among 
temperament  variables,  and  (f)  all  main  effects  and  all  interactions. 

Table  1  summarizes  the  results  from  all  models  tested.  Among  the  perform¬ 
ance  aaasures,  supervisors*  assessments  of  subordinate  performance  predicts 
reported  leadership  most  consistently.  Looking  across  all  of  the  m^els,  work 
sample  perforaanee  and  Job  knowledges  do  not  contribute  significantly  to  the 
prediction  of  Supportive  leadership.  Task  proficiency  predicts  participation 
vithin  the  Operations  Operations  and  Skilled  Technical  MOS  clusters;  Job 
knovledge  predicts  participation  vithin  the  Operations  cluster. 

Independently,  the  set  temperament  of  variables  accounts  for  at  least  as 
much  variance  in  reported  leadership  as  individual  differences  in  Job  perform¬ 
ance  do.  Vhen  considered  apart  from  the  performance  aeasures,  the  three 
temperament  measures  are  significant  predictors  of  rated  leadership  across 
MOS.  Combined  vith  the  performance  measures,  the  independent  contribution  of 
the  temperament  measures  veakens  somewhat,  suggesting  that  these  aeasures 
ahare  variance  vith  supervisory  ratings.  Cognitive  ability  is  a  significant 
predictor  in  only  one  HOS  cluster,  the  combat  related  Jobs.  Although  ability 
does  not  generally  make  a  direct  contribution  to  the  prediction  of  rated 


48 


ludarship,  past  rasaarch  suggasts  that  it  is  an  antacadant  of  Job  knovladga 
vhichy  in  turn  af facts  hands-on  parforaanca  and  suparvlsors'  assassaants  of 
aubordinata  parforaanca  (Vhita,  Boraan»  Bough  k  Boffaan,  1986).  Givan  tha 
aadiational  rola  that  Job  knovladga  plays,  its  failura  to  to  hava  a  diract 
affact  on  raportad  laadarship  in  tha  prasant  invastigstion  is  not  surprising^ 
Without  axcaption,  tha  eoabination  of  individual  diffarancas  in  taapara- 
•ant  and  Job  parforaanca  parforaanca  variables  accounts  for  aora  variance  in 
laadarship  aaasuras  than  aithar  set  of  variables  considered  alone.  Bovevar, 
tha  addition  of  interaction  tens  offers  little  advantage.  In  no  case  do  they 
increase  tha  aaount  of  variance  accounted  for  by  aora  than  two  percentage 
points.  Further,  tha  interactions  hava  no  consistent  pattern  of  significance. 

Finally,  the  tvo  laadarship  variabilas  appear  to  differ  in  hov  vail  they 
can  be  predicted  by  the  Independent  variables.  With  the  exception  of  the 
Coabat  Job  cluster,  regardless  of  vhich  aodal  is  tasted,  the  independent 
variables  account  for  aora  variance  in  Participation  than  Inspiration/Support. 
Further,  Achieveaent  Orientation  is  a  significant  predictor  of  Participation, 
but  not  of  Supportive  leadership.  Thus,  in  deteraining  the  aaount  of  support 
a  leader  vill  provide,  subordinate  attributes  aay  contribute  less  than  other 
detarainants  of  leadership  behavior  (e.g. ,  leader  attributes,  organizational 
noras  and  values,  resource  allocation),  vhereas  in  aost  HOS,  participation  aay 
depend  aora  heavily  on  subordinate  characteristics. 

Xn  suaaary,  soldiers  vho  report  receiving  higher  levels  of  support  from 
their  superiors  tend  receive  higher  scores  in  dependability  and  eaotional 
stability  and  are  seen  by  their  superiors  as  effective  perforaers.  Further, 
soldiers  vho  report  aorc  involvaaent  in  vork  related  decisions  have  the 
preceding  characteristics  and  arc  also  scored  as  core  achieveaent  oriented. 

The  present  research  successfully  extended  past  research  in  tvo  laportant 
vaya.  First,  it  deaonstrated  in  a  field  setting  that  pcrforaance  predicts 
reported  leadership.  Second,  although  pcrforaance  affects  soldiers'  treatment 
by  their  superiors.  Individual  differences  in  Job-related  teaperaaent  factors 
are  at  least  equally  laportant.  Further,  both  sets  of  variables  aake 
independent  contributions  to  the  prediction  of  reported  leadership.  Because 
treataent  by  superiors  can  be  predicted  from  relatively  stable  individual 
differences,  supervisory  treateaent  should  be  expected  to  generalize  across 
supervisors  and  through  tine.  Thus,  subordinates  vho  negotiate  sore  effective 
relationships  vith  their  superiors  during  the  first  tour  should  be  expected  do 
so  throughout  their  careers.  Additionally,  it  is  likely  that  future  bosses 
vill  see  these  individuals  as  aorc  effective. 

Future  research  should  trace  the  careers  of  individuals  in  the  Project  A 
database  to  deterainc  if ,  in  fact  these  predictions  hold.  Further,  the 
present  research  assuaed  one-vay  causality;  future  research  aight  address  the 
bi-directional  causality  of  superior-subordinate  interactions. 

Beferences 

Caapbell,  C.  B.  Caapbell,  B.  C.,  Buascy,  H.  6.,  &  Edvards,  D.C.  (October, 
1984).  Developaent  and  Field  Test  of  Task-Based  HOS-Specific  Criterion 
Measures,  (technical  Beport  Ho.  ^D),  Alexandria,  VA;  U.  S.  Aray 
teseareh  Institute  for  the  Behavioral  and  Social  Sciences. 

Baton,  M.  B.,  Goer ,  M.  B.,  Barris,  B.,  A  Book,  H.  (October,  1984). 

laprovins  The  Selection  Classification  yd  Utilization  of  Aray  Enlisted 
Personnel t  jfr"***^  Beport,  1984  Fiscal  tear.  (Technical  Beport  Mo.  660). 
Alexandria,  VA:  t).  8.  Aray  Besearch  Institute  for  the  Behavioral  and 
Social  Sciences. 


49 


Cratn,  G.  B.  (1976).  Bol«>uking  procMscs  vithin  coapltx  organizations.  1 
N.  Dunnctto  (Ed.).  Handbook  of  Industrial  orcanizational  psycholosy. 
Chicago:  Rand  McNally: - - “ 

Graan.  G.  B.t  Novak.  M.  A..  6  SuBaorkamp,  P.  (1982).  The  affects  of 

laadar-Mibar  exchange  and  Job  design  on  productivity  and  satisfaction: 
Tasting  a  dual  attachaent  aodel.  Organizational  Behavior  and  Human 
Parforaance.  30,  109>131. 

Greene.  C.  M.  (1975).  The  reciprocal  nature  of  influence  between  leader  and 
mibordlaate.  Journal  ^  Applied  Psychology.  60.  187-193. 

Bough.  L.  N..  Kaap.  J.  D..  6  Barge.  B.  A..  (1984).  Utility  of  Teaperaaent. 
Biodata,  wfid  Interest  Assessaent  for  Predicting  Job  Perforaance:  A 
iieviev  i^d  integration  of  the  Literature.  Minneapolis:  Personnel 
Decisions  kesMrek  Institute. 


Bough,  L.  M..  Gast,  1.  F. .  Vhite.  L.  A..  A  McCloy,  R.  (1986,  August).  The 
Relation  of  leadership  and  Individual  Differences  to  Job 
Perfownce.  Paper  Presented  at  the  Meeting  oi  the  American  Psychological 
iasociation,  Uashington,  0.  C. 

Jacobs,  T.  0.  (1971).  Leadership  and  Exchange  in  Organizations. 

Alexandria.  VA:  Buaan  Resources  kesearen  Organization  (BuaRRO). 

McLaughlin.  D.  B..  Rossaeissl.  P.  G..  Visor  L.  L. ,  Brandt,  D.  A.,  6  Vang,  M. 
(1984).  Validation  of  current  araed  services  vocational  aptitude  batter 


tes.  (Technical  Report  Ho.  651).  Alexandria,  VA:  U.S.  Army 
earen  Jinstltute  for  the  Behavioral  and  Social  Sciences. 


Novak,  N.  A.  (1985).  A  study  of  leader  resources  as  deterainants  of 

leader-BCTber  exchange.  Doctoral  dissertation,  University  of  Cincinnati. 
l984>.  inn  Arbor.  Ml:  University  Microfilas  International. 

Vecchio.  R.  P. .  6  Gobdel.  B.  C..  (1985).  The  vertical  dyad  linkage  aodel  of 
leadership:  Probleas  and  prospects.  Organizational  Behavior  and  Human 
Perforaance,  34,  5-20. 

Vakabayashi.  M.,  6  Graen,  G.  B..  (1984).  The  Japanese  career  progress  study: 

A  7-yaar  follov-up.  Journal  of  Applied  Psychology,  603-614. 

Vhite.  L.  A..  Gast.  1.  P..  6  Ruasey,  N.  G..  (1985).  Leader  betwvlor  and  the 

Brforaance  of  first  tera  soldiers.  Paper  presented  at  the  meeting  0?* the 
litary  Testing  Association.  San  Diego,  CA. 

Vhite.  L.  A.,  Boraan.  V.  C..  Bough,  L.  M..  6  Boffaan,  R.  G.  (1986,  August).  A 
path  analytic  aodel  of  Job  perfoarance  ratings.  Paper  presented  at  the 
■Meting  of  the  American  Psy^ological  Association,  Washington,  D.  C. 


50 


Fable  1 

Results  of  Regression  Analyses  for  Each  MOS  Cluster 

Main  Effects 

MODEL  AFQT  ACH  DEP  EMOT  HO  JX  SR 

1  2  3  4  5  6  7 


Clerical  MOS 


XNSP  -  IND.  DZF. 

NS 

* 

a* 

a 

ZMSP  «  PERF. 

NS 

NS 

aa 

INSP  -*PERF  +  IND. 

DIP. 

NS 

MS 

** 

MS 

NS 

MS 

aa 

1  INSP  MAIN  EFF. 

AFQT*TEMP 

NS 

MS 

NS 

NS 

NS 

MS 

aa 

1  INSP  *  MAIN  EFF.  + 

TEMP*TEMP 

MS 

NS 

** 

NS 

NS 

NS 

aa 

INSP  «  MAIN  EFF.  + 

ALL  INTERACTIONS 

NS 

* 

NS 

MS 

NS 

NS 

aa 

Cwabat.  M9S 

INSP  «  IND.  DIF. 

NS 

NS 

** 

aa 

1  JKSP  »  PERF. 

MS 

MS 

aa 

INSP  »  PERF  4-  IND. 

DIF. 

NS 

NS 

** 

-aa 

MS 

MS 

aa 

INSP  «  MAIN  EFF.  4> 

AFQT*TEMP 

** 

NS 

a* 

aa 

NS 

NS 

aa 

INSP  -  MAIN  EFF.  + 

TEMP*TEMP 

NS 

NS 

aa 

aa 

MS 

NS 

aa 

INSP  «  MAIN  EFF.  4> 

ALL  INTERACTIONS 

* 

MS 

aa 

aa 

NS 

NS 

aa 

fipacationp .K9S 

. 

INSP  »  IND.  DIF. 

NS 

** 

aa 

aa 

INSP  «  PERF. 

NS 

NS 

aa 

INSP  «  PERF  4  IND. 

DIF. 

KS 

** 

aa 

aa 

NS 

MS 

aa 

INSP  -  MAIN  EFF.  4 

AFQT^TEMP 

NS 

NS 

aa 

aa 

aa 

NS 

aa 

INSP  -  MAIN  EFF.  4 

TEMP*TEMP 

NS 

«* 

aa 

aa 

NS 

MS 

aa 

INSP  *  MAIN  EFF.  4 

ALL  INTERACTIONS 

NS 

NS 

aa 

aa 

NS 

NS 

aa 

SKllltfl  Jftglmioal  ll<?g 

INSP  *  IND.  DIP. 

NS 

* 

aa 

aa 

.  INSP  -  PERF. 

NS 

NS 

aa 

INSP  «  PERF  4  IND. 

DIF. 

NS 

NS 

aa 

aa 

NS 

MS 

aa 

INSP  -  MAIN  EFF.  4  AFQT*TEMP 

NS 

NS 

NS 

NS 

NS. 

NS 

aa 

INSP  »  MAIN  EFF.  4 

TEMP^TEMP 

NS 

NS 

aa 

aa 

NS 

MS 

aa 

INSP  •  MAIN  EFF.  4 

ALL  INTERACTIONS 

NS 

NS 

MS 

NS 

MS 

MS 

aa 

- 

1 

2 

3 

4 

5 

6 

7 

51 


Participation 


Tical  MOS 

IT  •  XND.  DIP. 

IT  ■  PERF. 

\T  •  FERF  •I'  XND.  DXF. 

VS  «  NRIM  EFF.  *  AFC2T*TEMP 
VS  •  MXIN  EFF.  *  TEKP*TEMP 
VS  m  MMN  EFF.  -f  ALL  INTERACTIONS 

aibat  MOS 

XI  •  XND.  DXF. 

RT  »  PERF. 

RT  »  PERF  -f  XND.  DXF. 

RT  •  MAIN  EFF.  •(>  AFQT*TEMP 
RT  •  MAIN  EFF.  TEKP*TEHP 

RT  •  MAIN  EFF.  ALL  INTERACTIONS 

arationg  nog 

RT  «  IND.  OIF. 

RT  -  PERF. 

RT  •  PERF  +  XND.  DIF. 

RT  •  MAIN  EFF.  +  AFQT*TEMP 
RT  -  MAIN  EFF.  +  TEHP*TEKP 
RT  -  MAIM  EFF.  •¥  ALL  INTERACTIONS 

Tachnical  MOS 
RT  -  IMD.  OIF. 

RT  -  PERF. 

RT  -  PERF  +  XND.  DIF. 

RT  -  MAIN  EFF.  AFQT*TEMP 
RT  -  MAIN  EFF.  ^  TEMP*TEMP 
RT  -  MAIN  EFF.  ALL  INTERACTIONS 


MS 

** 

** 

** 

NS 

** 

** 

NS 

** 

* 

** 

NS 

NS 

* 

NS 

* 

MS 

MS 

MS 

MS 

NS 

NS 

** 

** 

** 

NS 

NS 

* 

NS 

* 

MS 

NS 

NS 

NS 

NS 

NS 

** 

** 

•* 

NS 

NS 

** 

MS 

** 

** 

* 

MS 

NS 

** 

MS 

** 

** 

MS 

MS 

MS 

MS 

•* 

** 

* 

MS 

MS 

** 

MS 

** 

MS 

MS 

NS 

** 

MS 

MS 

** 

** 

** 

** 

** 

** 

** 

** 

MS 

** 

** 

** 

NS 

** 

NS 

** 

** 

** 

** 

MS 

** 

* 

** 

** 

** 

** 

NS 

** 

NS 

** 

** 

** 

** 

MS 

** 

**  ' 

* 

NS 

** 

NS 

•* 

** 

** 

* 

NS 

** 

MS 

MS 

NS 

NS 

•* 

NS 

** 

MS 

** 

** 

** 

** 

NS 

** 

MS 

MS 

MS 

** 

MS 

MS 

** 

$2 


THE  PROJECT  A  CONCURRENT  VALIDATION  DATA  COLLECTION 


James  H.  Harris 
John  P.  Campbell 
Charlotte  Campbell 

Human  Resources  Research  Organization 


Presented  on  Symposium, 

“Project  A  Concurrent  Validation:  Preliminary  Results" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


53 


The  Project  A  Concurrent  Validation  Data  Collection 

James  H.  Harris  John  P.  Campbell 

Human  Resources  Research  Organization  University  of  Minnesota 

Charlotte  H.  Campbell 
Human  Resources  Research  Organization 

Introduction 

The  purpose  of  this  paper  is  to  describe  the  Project  A  concurrent  validation  data 
collection  and  relate  some  "lessons  learned"  about  the  administration  of  large  scale 
data  collections.  During  this  data  collection,  predictor  and  criterion  measures  were 
administered  to  approximately  9,500  entry-level  soldiers  and  rating  scales  were 
administered  to  approximately  7,000  supervisors  of  these  soldiers.  The  original  Project 
A  Research  Plan  specified  a  concurrent  validation  target  sample  size  of  600-700  skill 
level  (SL1)  job  incumbents  fOr  each  of  19  mos,  using  procedures  that  had  been  tried 
out  and  refined  during  the  predictor  and  criterion  field  tests.  The  Research  Plan 
specified  13  data  collection  sites  in  the  United  States  (CONUS)  and  two  in  Europe 
(USAEUR).  the  number  of  sites  was  the  maximum  that  could  be  visited  within  the 
Project’s  budget  constraints,  which  dictated  that  sites  be  chosen  to  maximize  the 
probability  of  obtaining  the  required  sample  sizes,  the  data  collection  schedule,  by 
site,  is  shown  in  Figure  1. 

The  basic  sampling  plan,  data  collection  team  training,  data  collection  procedures, 
and  lessons  learned  are  presented  in  the  following  sections. 

Sampling  Plan 

The  general  sampling  plan  was  to  use  the  Army’s  World-Wide  Locator  System  to 
identify  ail  the  first-term  enlisted  personnel  in  the  19  mos  at  each  chosen  site  who 
entered  the  Army  between  1  July  1983  and  30  July  1984.  If  possible 


^This  research  was  funded  by  the  U.S.  Army  research  Institute  for  the  BehaviOTal  and 
Social  Sciences,  Contract  No.  MDA903-82-C-0531.  Ail  statements  expressed  in  this 
paper  are  those  of  the  authors  and  do  not  necessarily  express  the  offidai  opinions  or 
policies  of  the  U.S.  Army  Research  Institute  or  the  Depar^nt  of  the  Army. 


^The  material  in  this  paper  is  from  two  sources:  Campbell,  C.H.,  &  Hoffman  R.G.  Cm 
press).  Concurrent  validation  hands-on  data  collection:  Lessons  teamed.  Alexandria, 
VA:  Human  Resources  Research  Organization  (HumRRO). 

Human  Resources  research  Organization,  American  Institutes  for  Research,  Personnel 
Decisions  Research  Institute  and  Army  Research  institute  (1985).  Improving  the 
selection,  classification,  and  utilization  of  Army  enHsted  personnel:  Annual  Report.  ARI 
Technical  Report  ZdiS-  Alexandria,  VA:  Army  Research  Institute. 


54 


muali 

WJW'WJW 

PLIm 

WM'WM 

0L  Bimiilf 

PIM 

aM'^M 

mcMatai 

aM'MM 

PLCmmm 

•M.atoa 

rtHm 

•  JW-nAai 

M.MM 

iM>aiM 

KM 

rM'WM 

UMMIM 

iajw*aoci 

ttM*aM 

MM'WM 

□  □DO 

□ 

oaoa 

□  □□ 

□ 

□ 

□  □□□ 

□  □□□ 

□  □  D 

oo 

NO.  BATA 

oDoa 

OODO 

□  □□□□ 

□  □ 

couicnoN 

□  □□□ 

□  ODD 

□  □□□□ 

□  □□□ 

TIAMt 

□  □□ 

□  □□□ 

aooo 

□  □□□□ 

13000 

OO 

□  □□O 

oaoo 

aooo 

□  □□□□ 

oooo 

OO 

sum 

AA.T 

JIM 

MPT 

•er 

MOV 

•  M  a  M 

f  M  t1  M 

4  n  a  a 

■  a  a  a 

t  it 

Figure  1.  Concurrent  validation  schedule. 


the  Individual's  unit  Identification  was  also  to  be  retained.  The  steps 
described  below  were  then  followed.  The  Intent  was  to  be  as  representative  as 
possible  while  preserving  enough  cases  within  units  to  provide  a  "within 
rater"  variance  estimate  for  the  supervisor  and  peer  ratings. 

A.  Preliminary  Steps 

1.  Identify  the  subset  of  MOS  (within  the  sample  of  19)  for  which 
It  would  be  possible  to  actually  sample  people  within  units  at 
specific  posts.  That  Is,  given  the  entry  date  "window"  and 
given  that  only  50-75  percent  of  the  people  on  any  list  of 
potential  subjects  could  actually  be  found  and  tested,  what  MOS 
are  large  enough  to  permit  sampling  to  actually  occur?  List 
them. 

2.  For  each  MOS  In  the  subset  of  MOS  for  which  sampling  Is 
possible.  Identify  the  smallest  "unit"  from  which  6-10  people 
can  be  drawn.  Ideally,  we  would  like  to  sample  4-6  units  from 
each  post  and  6-12  people  from  each  unit.  For  the  total 
concurrent  sample  this  would  provide  enough  units  to  average 
out  or  account  for  differential  training  effects  and  leadership 
climates,  while  still  providing  sufficient  degrees  of  freedom 
for  Investigating  within-group  effects  such  as  rater 
differences  In  performance  appraisal. 

3.  For  the  four  MOS  In  the  Preliminary  Battery  (PB)  sample. 
Identify  the  members  of  the  PB  saiiq)le  who  are  on  each  post. 

B.  The  Ideal  Implementation  would  be  to  obtain  the  Alpha  Roster  list  of 
the  total  population  of  people  at  each  post  who  are  In  the  19  MOS 
and  who  fit  our  "window."  The  lists  would  be  sent  to  the  data 
collection  manager  where  the  following  steps  would  be  carried  out. 


55 


1.  For  each  HOS,  randomize  units  and  randomize  names  within  units. 


2.  Select  a  sample  of  units  at  random.  The  number  would  be  large 
enough  to  allow  for  some  units  being  truly  unobtainable  at  the 
time  of  testing. 

3.  Instruct  the  Pol nt*of-Contact  (POC)  at  the  post  to  obtain  the 
required  number  of  people  by  starting  at  the  top  of  the  list 
and  working  down  (as  in  the  Batch  A  field  test)  within  each  of 
the  designated  units.  If  an  entire  unit  is  unavailable,  go  on 
to  the  next  one  on  the  list. 

4.  In  those  MOS  for  which  unit  sampling  is  not  possible,  create  a 
randomized  list  of  everyone  on  the  post  who  fits  the  window. 
Instruct  the  POC  to  obtain  the  required  number  by  going  down 
the  list  from  top  to  bottom  (as  in  the  Batch  A  field  tests). 

C.  If  it  is  not  possible  to  bring  the  Alpha  Roster  to  the  data 

collection  manager,  provide  project  staff  at  the  post  to  assist  the 
POC  in  carrying  out  the  above  steps. 

1.  If  it  is  not  possible  to  randomize  names  at  the  post,  first  use 
the  World-Wide  Locator  to  obtain  a  randomized  list,  carry  the 
list  to  the  post  and  use  it  to  sample  names  from  units  drawn 
from  a  randomized  list  of  units.  If  there  are  only  6-8  units 
on  the  post,  then  no  sampling  of  units  is  possible.  Use  them 
all. 

0.  If  it  is  not  possible  for  project  personnel  to  visit  the  post,  then 
provide  the  randomized  World-Wide  Locator  list  to  the  POC  and  ask 
him  or  her  to  follow  the  sampling  plan  described  above  with  written 
and  telephone  assistance.  That  is,  the  POC  would  identify  a  sample 
of  units  (for  those  MOS  for  which  this  is  possible),  match  the  unit 
roster  with  the  randomized  World-Wide  Locator  list,  and  proceed  down 
each  unit  until  the  required  number  of  people  was  obtained.  If  the 
POC  can  generate  their  own  randomized  list  from  the  Alpha  Roster,  so 
much  the  better.  The  World-Wide  Locator  serves  only  to  specify  an  a 
priori  randomized  list  for  the  POC. 

E.  If  none  of  the  above  options  is  possible,  then  present  the  POC  with 
the  sampling  plan  and  instruct  him  or  her  to  obtain  the  required 
number  of  people  in  the  most  representative  way  possible  (the  Batch 
B  procedure). 

The  final  sample  sizes  are  shown  by  post  and  by  MOS  in  Figure  2.  Note 
that  It  was  not  always  possible  in  all  MS  to  find  as  many  as  600  incumbents 
with  the  appropriate  accession  dates  at  the  15  sites.  Some  MOS  simply  aren't 
that  big. 

Data  Collection  Team  Training 

Each  data  collection  team  was  composed  of  a  Test  Site  Manager  (TSM)  and 
six  or  seven  project  staff  members  who  were  responsible  for  test  and  rating 
scale  administration.  The  teams  were  made  up  of  a  combination  of  regular 


56 


A  90$ 


UcbtM 

lit 

lio 

IM 

IK 

4M 

44C 

lib 

SIS 

OBO 

roct  •vMimt 

4» 

22 

41 

1 

11 

M 

14 

9 

21 

rmrt  tllM 

• 

90 

W 

IS 

41 

at 

21 

0 

44 

r«rt 

U 

U 

• 

t 

11 

» 

U 

It 

13 

r«ct 

M 

94 

• 

20 

4a 

41 

S4 

U 

42 

fmrt  CerMm 

40 

St 

n 

JO 

41 

11 

20 

21 

44 

fMt  1004 

U 

U 

a 

JO 

4a 

u 

IS 

SO 

40 

Fort  Imos 

29 

22 

111 

24 

14 

44 

22 

49 

22 

fett  te«t« 

n 

44 

11 

11 

41 

44 

9J 

21 

M 

Tort  Or4 

10 

0 

• 

14 

le 

41 

31 

43 

91 

Fort  Folb 

71 

41 

1* 

29 

41 

41 

la 

14 

44 

Tort  Naler 

JO 

4J 

» 

21 

14 

41 

12 

20 

40 

Tort  StU 

0 

!9t 

a 

20 

41 

11 

44 

0 

29 

Toft  StooatC 

44 

44 

M 

11 

la 

11 

21 

49 

49 

omsvt 

1J2 

222 

iM 

120 

111 

111 

114 

119 

IIS 

total 

i«r 

SUJ 

"1«” 

421 

U4 

214 

901 

"492 

%  Total 

1.44 

1.W1  S.ll 

l.U  4.14  1.11  1.44  1.11 

1.24 

Figure  2.  Concurrent  validation 


•Mik  t  Ml 


tm 

taa 

lit 

910 

9U 

lit 

410 

MO 

Mt 

Mt 

Tmtal 

9  Total 

12 

IS 

9 

0 

22 

IS 

9 

22 

11 

12 

314 

1.39 

11 

s 

1 

0 

14 

0 

22 

S 

21 

20 

241 

J.44 

03 

19 

22 

29 

12 

20 

1 

u 

19 

42 

12b 

7.74 

90 

22 

to 

0 

22 

IS 

43 

II 

41 

44 

191 

4.01 

49 

11 

22 

0 

29 

1 

0 

22 

40 

47 

419 

7.21 

SI 

40 

4 

12 

43 

M 

44 

n 

41 

91 

141 

4.11 

41 

10 

4 

0 

• 

12 

0 

10 

29 

24 

924 

9.94 

21 

» 

1 

11 

91 

31 

20 

tt 

41 

H 

421 

4.49 

92 

1 

9 

1 

4 

1 

IS 

21 

40 

21 

429 

4.91 

40 

49 

9 

4 

14 

7 

21 

24 

91 

tb 

42 

4.11 

11 

20 

S 

9 

29 

92 

0 

:o 

19 

49 

919 

4.14 

42 

11 

0 

0 

0 

0 

IS 

1 

19 

12 

421 

4,43 

20 

19 

9 

• 

11 

29 

24 

44 

14 

39 

417 

4.94 

120 

10 

41 

41 

94 

94 

42 

109  134 

112 

1942 

20.  «i) 

nT 

410 

141 

104 

424 

211 

214 

490 

420 

412 

94  iO 

1.41  4.90  1 

.U 

1.19 

4.40 

1.09 

2.92 

9,20  4 

.44 

4.49 

saaple  soldiers  by  HOS  by  location. 


project  staff  and  Individuals  (e.g.,  graduate  students)  specifically 
recruited  for  the  data  collection  effort.  The  test  site  manager  was  an  "old 
hand"  who  had  participated  heavily  In  the  field  tests.  This  team  was  assisted 
by  eight  NCO  scorers  (for  the  hands-on  tests),  one  company-grade  officer  POC, 
and  up  to  five  NCO  support  personnel,  all  recruited  from  the  post. 

The  project  data  collection  teams  were  given  three  days  of  training  at  a 
central  location.  During  this  period.  Project  A  was  explained  In  detail. 
Including  Its  operational  and  scientific  objectives.  After  the  logistics  of 
how  the  team  would  operate  (transportation,  meals,  etc.)  were  discussed,  the 
procedures  for  data  entry  from  the  field  to  the  computer  file  were  explained 
In  some  detail.  Every  effort  was  made  to  reduce  data  entry  errors  at  the 
outset  via  correct  recording  of  responses  and  correct  Identification  of  answer 
sheets  and  disketttes. 

Next,  each  predictor  and  criterion  measure  was  examined  and  explained. 

The  trainees  took  each  predictor  test,  worked  through  samples  of  the  knowledge 
tests,  and  role  played  the  part  of  a  rater.  Considerable  time  was  spent  on 
the  nature  of  the  rating  scales,  rating  errors,  rater  training,  and  the 
procedures  to  be  used  for  administering  the  ratings.  All  administrative 
manuals,  which  had  been  prepared  In  advance,  were  studied  and  pilot  tested, 
role  playing  exercises  were  conducted,  and  hands-on  Instruction  for  mainte¬ 
nance  of  the  computerized  test  equipment  was  given. 

The  Intent  was  that  by  the  end  of  the  three-day  session  each  team  member 
would  (a)  be  thoroughly  familiar  with  all  predictor  tests  and  performance 
measures,  (b)  understand  the  goals  of  the  data  collection  and  the  procedure 


57 


for  avoiding  negative  critical  Incidents,  (c)  have  had  an  opportunity  to 
practice  administering  the  Instruments  and  to  receive  feedback,  and  (d)  be 
committed  to  making  the  data  collection  as  error-free  as  possible. 

As  noted  above,  eight  NCO  scorers  were  required  for  Hands-On  test  scor¬ 
ing.  They  were  recruited  and  trained  using  procedures  very  similar  to  those 
used  at  each  post  in  the  criterion  field  tests.  Training  took  place  over  one 
full  day  and  consisted  of  (a)  a  thorough  briefing  on  Project  A,  (b)  an  oppor¬ 
tunity  to  take  the  tests  themselves,  (c)  a  check-out  of  the  specified  equip¬ 
ment,  and  (d)  multiple  practice  trials  In  scoring  each  task,  with  feedback 
from  the  project  staff.  The  Intent  was  to  develop  high  agreement  for  the 
precise  responses  that  would  be  scored  as  60  or  NO-GO  on  each  step. 

Data  Collection  Procedure 

The  data  collection  proceeded  as  follows:  The  first  day  was  devoted  to 
equipment  and  classroom  set-up,  general  orientation  to  the  data  collection 
environment,  and  a  training  and  orientation  session  for  the  post  POC  and  the 
NCO  support  personnel. 

On  the  first  day  of  actual  data  collection  the  soldiers  who  arrived  at 
the  test  site  were  divided  randomly  Into  two  equal  groups.  Identified  as  Group 
1  or  2.  Each  group  was  directed  to  the  appropriate  area  to  begin  the  adminis¬ 
tration  for  that  group.  They  rotated  under  the  direction  of  the  test  site 
manager  through  the  appropriate  block  according  to  the  schedule. 

For  soldiers  In  a  Batch  Z  MOS,  like  12B,  the  procedure  took  one  day.  For 
soldiers  In  a  Batch  A  MOS,  like  MOS  91A,  the  procedure  was  similar  but  took 
two  days  to  rotate  the  soldiers  through  the  appropriate  blocks.  The  measures 
administered  In  each  block  are  shown  In  Figure  3. 


BATCH  A  MOS 

4  Blocks  4  Hrs.  Each 

• 

BATCH  Z  MOS 

2  Blocks  4  Hrs.  Each 

Block  1 

Predictor  Tests 

Block  1 

Predictor  Tests 

Block  2 

School  and  Job  Knowisdoo  Tests 
Army-Wide  Ratings 

Block  2 

School  and  Job  Knowledge  Tests 
Army-Wide  Ratings 

Block  3 

MOS  Specllle  Hands-On  Tests 

Block  4 

MOS  Ratings 

MOS  Specllic  Written  Tests 

Figure  3.  Concurrent  validation  test  outline. 


58 


Lessons  Learned 


Collecting  data  from  16,000  soldiers  In  15  locations  over  six  months  Is  a 
difficult  task,  one  that  requires  careful  planning,  attention  to  detail,  an 
ability  to  adapt,  a  fondness  for  crisis  management,  and  a  special  relationship 
with  the  telephone.  For  anyone  planning  an  effort  of  like  grandeur  (or  even 
grander),  a  few  lessons  learned  from  some  of  the  survivors  seems  appropriate. 
Ue  divide  the  lessons  Into  three  categories:  planning,  coordinating,  and 
operating  Each  category  Is  briefly  discussed  below. 

Planning.  Start  as  early  as  possible  (18  months  before  collecting  data) 
to  Identify  the  support  you  will  need,  to  Include  personnel,  equipment, 
facilities,  and  time  requirements.  Once  you  know  what  you  need  and  when  you 
need  It,  schedule  a  series  of  briefings  with  the  Commanders.  Start  at  the  top 
with  the  CG  of  FORSCOM,  TRADOC,  and  USAREUR  and  work  your  way  through  a  series 
of  briefings  until  you  reach  the  local  POC  responsible  for  seeing  that  you  get 
what  you  need  when  you  need  it.  Be  prepared  to  change  your  plans  at  each  step 
to  meet  local  concerns.  Once  you  meet  and  brief  your  POC,  you  can  begin 
coordinating. 

Coordinating.  The  closer  the  time  to  begin  data  collecting,  the  more 
frequently  you  will  speak  to  the  POC.  Expect  to  speak  daily  when  you  get 
within  30  days  of  data  collection.  In  some  instances,  you  may  have  to  make  a 
trip  to  the  Installation  for  a  final  coordination  meeting.  Be  prepared  to  be 
very  flexible  with  regard  to  the  installation's  internal  schedule. 

Operating.  Host  of  the  lessons  learned  In  this  category  have  to  do  with 
hands-on  testing. 

1.  Many  Instances  of  equipment  variation  can  be  (and  were)  anticipated. 
Test  developers  and  site  coordinators  must  find  out  what  major  pieces  of 
equipment  are  not  likely  to  be  available  at  the  selected  sites  In  advance  of 
actual  testing  If  high  quality  tracked  tests  are  to  be  prepared. 

2.  Printed  scoresheets  must  be  proofed  carefully  to  ensure  that  for 
every  step  which  should  be  scored,  a  score  can  be  recorded. 

3.  Scorers  must  be  thoroughly  trained,  not  only  on  how  to  set  up  and 
administer  the  tests,  but  also  on  how  to  record  data  on  the  scoresheets.  They 
must  be  given  practice  In  using  the  scoresheets  (not  just  talked  through  It) 
before  testing,  and  monitored  closely  during  testing,  especially  with  the 
first  few  soldiers  tested.  Continual  monitoring  must  also  occur  throughout 
the  testing. 

4.  Scorers  and  hands-on  managers  must  document  meticulously  who  was 
tested  on  what,  and  also  who  wasn't  tested  on  what,  and  why. 

5.  Experienced  hands-on  managers  are  often  able  to  Implement  procedures 
to  deal  with  equipment  malfunctions  or  variations,  but  these  too  must  be 
documented. 

6.  Completed  scoresheets  must  be  checked  as  soon  as  possible  after 
testing  so  that  careless  or  incorrect  scoring  can  be  detected,  and  the  errant 
scorer  can  be  retrained. 


59 


POST  DIFFERENCES  IN  HANDS-ON  TASK  TESTS 


R.  Gene  Hoffnan 

Human  Resources  Research  Organization 


Presented  on  Session,  "Issues  In  Hands-On  Performance  Testing" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


61 


Post  Differences  in  Hands-On  Task  Tests 


R.  Gene  Hoffman 

Human  Resources  Research  Organization 


One  of  the  major  efforts  for  the  U.S.  Army's  Selection  and 
Classification  Project  (Project  A)  has  been  the  development  of  hands-on 
performance  measures.  The  effort  required  preparation  of  tests  to  cover 
approximately  15  tasks  for  soldiers  in  nine  different  job  specialties  (MOS). 
Because  of  equipment  differences  within  certain  NOS,  it  was  necessary  to 
create  alternate  versions  of  some  tests.  Thus,  103  different  task  tests  were 
prepared.  Eleven  tests  were  used  in  more  than  one  MOS  with  the  number  of 
tests  per  MOS  ranging  from  14  to  27.  As  part  of  the  concurrent  validation 
data  collection  effort,  these  tests  were  administered  during  1985  to 
approximately  500  to  600  soldiers  per  NOS.  In  order  to  collect  that  volume 
of  data,  test  sites  included  13  different  Army  posts  in  the  United  States 
plus  European  test  sites.  At  the  European  sites,  approximately  120  soldiers 
for  each  NOS  were  tested.  At  the  COtAJS  sites,  the  numbers  of  soldiers  per 
MOS  per  site  ranged  from  9  to  110  with  typical  numbers  being  near  30,  near  45 
or  near  60  because  of  scheduling  requirements.  The  tests  were  administered 
in  blocks  of  two  to  four  tasks  per  test  station  with  typically  one  NCO  in  the 
respective  MOS  at  each  site  handling  test  administration  for  all  soldiers  at 
any  given  station. 

Given  these  "road  show"  requiranents  for  data  collection,  considerable 
effort  was  made  to  standardize  the  hands-on  testing  procedures.  These 
efforts  included  attention  to  test  set-up  and  scoring  instructions  and  to  the 
training  of  test  administrators.  Prior  to  concurrent  data  collection,  test 
procedures  were  pilot  tested  on  a  small  sample  of  soldiers  using  four  to  five 
test  administrators  and  then  field  tested  on  approximately  150  soldiers. 
Administrator  training  included  five  phases:  (1)  presentation  of  general 
testing  principles,  (2)  familiarization  with  individual  test  station 
requirements,  (3)  practice,  (4)  review  by  contractor  personnel  prior  to  data 
collection,  and  (5)  monitoring  by  contractor  personnel  during  data 
collection.  Further  details  concerning  test  construction  and 
administration  are  presented  in  Campbell  et  al.,  (1985)  and  Campbell  (in 
preparation). 

Given  that  hands-on  testing  has  a  history  of  being  susceptible  to  scorer 
differences  (e.g.,  Naier,  1983),  this  paper  examines  differences  between 
posts  in  hands-on  test  scores  and  the  extent  to  which  any  such  post 
differences  are  not  "real"  differences,  but  are,  in  some  way,  artifacts  of 
the  measurement  process.  Thus,  analyses  examined  alternative  sources  of 
variance  in  hands-on  test  scores  that  could  account  for  any  mean  differences 
between  posts.  Candidate  measures  for  explaining  differences  available  in 
the  Project  A  data  set  include:  {1)  written  tests,  (2)  supervisor  and  peer 
ratings  of  performance,  (3)  practice,  (4)  time  in  service,  and  (5)  ability. 
Post  effects  were  estimated  after  variance  due  to  these  measures  was  removed 
from  the  hands-on  tests  (using  hierarchical  multiple  regression)  and  compared 
with  post  effects  prior  to  any  adjustment. 


62 


Analysis 


Analyses  were  conducted  for  every  hands-on  test  in  all  nine  NOS.  No 
adjustment  was  made  for  tests  appearing  in  more  than  one  NOS.  That  is, 
repeated  tests  were  treated  as  separate  observations.  Thus,  there  were  147 
observations  of  post  differences  where  an  observation  is  a  test/NOS 
combination.  The  first  series  of  analyses  estimated  unadjusted  post  effects 
(percent  of  variance  in  hands-on  score  accounted  for  by  post  alone)  and  post 
effects  adjusted  for  written  test  scores  (except,  obviously,  those  tasks 
tested  only  in  the  hands-on  mode),  task  ratings  by  peers  and  by  supervisors, 
overall  performance  ratings  by  peers  and  supervisors,  practice  (composite  of 
self  ratings  of  recency  and  frequency  of  task  performance),  time  in  service 
(test  date  minus  entry  date),  and  general  ability  (AFQT).  In  conducting 
these  analyses,  significant  reductions  in  sample  sizes  between  post  only  and 
adjusted  post  analyses  were  observed  for  all  NOS.  The  reductions  were  most 
attributable  to  missing  ratings.  Therefore,  an  alternative  or  "reduced" 
adjustment  model  was  also  examined  in  which  ratings  were  excluded.  Thus,  for 
each  of  the  147  tasks,  three  different  R^s  were  calculated  between  post  and 
hands-on  scores:  (1)  an  unadjusted  "post  alone"  R^,  (2)  an  adjusted  R^  for 
post  after  all  other  variables  in  the  "full  model"  were  controlled,  and  (3) 
an  adjusted  R^  for  post  after  all  other  variables  in  the  "reduced  (ratings 
excluded)  model"  were  controlled.  Adjusted  R^s  were  calculated  as  the 
increase  in  R^  when  post  was  added  after  all  control  variables  in  a 
hierarchical  multiple  regression  predicting  hands-on  score.  Mean  sample 
sizes  for  these  analyses  were  500.21  for  post  alone,  164.01  for  the  "full 
model"  (i.e.,  all  variables)  and  341.77  for  the  "reduced  model." 

The  R^s  between  post  alone  and  hands-on  scores  estimate  the  extent  of 
between  post  differences  in  hands-on  scores.  These  were  compared  to  the  R^s 
for  post  and  hands-on  scores  after  variance  due  to  the  other  variables  in  the 
full  and  reduced  models  were  controlled.  Differences  in  variance  accounted 
for  by  post  (i.e.,  differences  in  R^s)  were  calculated  as  indices  of  the  bias 
resulting  from  post  differences.  Thus,  two  bias  indices  for  each  hands-on 
test  resulted  from  these  analyses:  a  "full  model"  bias  and  a  "reduced  model" 
bias.  The  term  bias  has  been  used  in  response  to  the  question:  "Would 
standardizing  hands-on  scores  by  post  bias  those  scores?"  Positive  values 
for  these  bias  indices  would  suggest  that  any  post  differences  are  to  some 
extent  real  and  that  standardizing  would  introduce  bias.  On  the  other  hand, 
near  zero  values  suggest  that  post  differences  are  unrelated  to  other 
measurements  of  performance,  therefore  may  reflect  measurement  error,  and 
that  standardization  may  be  justified. 

The  above  analyses  were  conducted  on  a  task  by  task  basis.  From  these 
analyses  it  is  not  possible  to  tell  whether  the  "post"  effects  are  actually 
-at  the  post  level  or  are  more  correctly  attributable  to  scorer  differences. 
Two  approaches  were  used  to  address  this  question,  neither  of  which  is 
definitive.  First,  if  "post"  effects  (within  an  MOS)  were  operating 
consistently  for  all  tasks  within  an  MOS  (e.g.,  motivational  differences 
between  posts),  then  it  should  be  possible  to  account  for  post  variance  in 
any  one  task  by  removing  variance  associated  with  the  hands-on  test  scores 
for  other  tasks  within  each  MOS.  Thus  for  each  task,  an  adjusted  for  post 
effects  were  examined  after  variance  associated  with  other  MOS  tasks  was 
removed.  An  "other  tasks"  bias  index  was  constructed  as  the  difference 


63 


between  post  effects  alone  and  this  “other  tasks"  adjusted  R^.  If  this  index 
is  near  zero,  the  "post"  effects  are  task  specific  and  not  consistent  across 
tasks  within  an  NOS. 

A  second  way  to  partially  dissect  the  task  by  task  post  effects  is  to 
examine  scorer-within-post  variance  capitalizing  on  the  instances  where  two 
or  more  scorers  scored  the  same  test  at  the  same  post  either  by  general 
design  (i.e.»  duplicate  equipment  and  test  stations  in  the  test  plan)  or  by 
local  variation  (i.e.,  an  early  finishing  scorer  helping  at  another  station). 

The  series  of  analyses  examining  post  effects  controlling  for 
performance  on  other  hands-on  tests  occurred  some  time  after  the  first,  and 
in  that  interval  two  91A  tracked  tests  were  merged;  therefore  146  separate 
tasks  were  analyzed.  Again  a  "bias"  variable  was  calculated  as  the 
difference  between  post  effects  alone  and  adjusted  post  effects. 


Results 


Results  for  these  analyses  are  sumnarized  in  Table  I  below.  All  data 
points  were  either  R^s  (for  the  Post  Only  analyses),  increases  in  R^s  (for 
the  full,  reduced  and  other  task  model  analyses),  or  differences  between  R^s 
(for  the  bias  variables).  Thus,  table  entries  are  the  means,  standard 
deviations,  minimums  and  maximuras  for  these  R^s  across  the  147  tasks. 

Uncorrected  post  differences  account  for  an  average  of  19%  of  the 
variance  in  hands-on  test  scores,  indicating  the  presence  of  post  differences 
in  hands-on  scores.  Post  effects  range  from  25  to  505.  For  only  36  of  the 
147  tasks  is  the  post  effect  less  than  105  of  the  hands-on  variance. 
Furthermore,  there  is  no  evidence  that  post  differences  can  be  consistently 
attributed  to  written  test  scores,  practice,  ratings,  ability,  or  time  in 
service.  Mean  bias  from  the  full  and  reduced  model  analyses  are  both  very 
near  zero  suggesting  that  removing  post  differences  by  standardization  would 
not  bias  the  hands-on  scores. 

Table  1 

Hands-On  Test  Variance  (R^)  Associated  With  Post 
With  and  Without  Controls  and  Associated  Adjustment  Bias 


Variance  Associated  with  Post  Standardization  Bias 


Post 

Only 

Model 

Full 

Model 

Reduced 

Model 

Other 

Tasks 

Model 

Full 

Model 

Bias 

Reduced 

Model 

Bias 

Other 

Task 

Model 

Bias 

Nean 

R2 

0.19 

0.22 

0.18 

0.12 

-0.03 

0.01 

0.07 

S.D. 

R2 

0.11 

0.11 

0.11 

0.08 

0.08 

0.05 

0.06 

Nin. 

R2 

0.02 

0.01 

0.01 

0.01 

-0.25 

-0.24 

-0.04 

Max. 

R2 

0.50 

0.52 

0.52 

0.34 

0.24 

0.23 

0.33 

64 


Results  for  the  "other  tasks"  model  are  presented  in  Table  1.  Bias  as 
estimated  by  this  model  is  somewhat  larger  than  the  others  and  suggests  that 
to  some  extent  post  differences  for  any  given  task  are  related  to  post 
differences  for  other  tasks.  However,  certainly  not  all  of  the  task  level 
post  effects  are  explained. 

Table  2  indicates  that  the  147  tasks  are  rather  homogeneous  with  regard 
to  reduced  model  bias.  For  the  147  tasks,  114  reduced  model  bias  indices  are 
between  -.05  and  .05.  The  other  bias  indices  are  similarly  homogeneous. 

Thus,  the  post  effects  that  are  present  remain  so  after  attempts  to  explain 
them  are  considered  and  that  trend  is  consistent  across  all  tasks. 

Table  2 

Distribution  of  Reduced  Model  Bias  Across  147  Hands-On  Tests 


Reduced  Model 
Bias 

Frequency 

Percent 

-0.3(L 

-0.25: 

-o.2c: 

n  1  w 

0 

1 

0 

.00 

.68 

.00 

-U. 

f\  1  (r 

4 

2.72 

-0.05^ 
n  nrr 

6 

56 

4.08 

38.10 

-U.UU^ 

0.05: 

o.io: 

0.15: 

0.20: 

0.25^ 

58 

39.46 

18 

1 

2 

1 

12.24 

.68 

1.36 

.68 

The  final  analysis  made  use  of  the  duplication  of  scorers  for  some  tasks 
at  some  posts.  Because  this  duplication  was  not  systematically  planned,  some 
instances  of  duplication  of  scorers  were  due  to  a  scorer  at  one  post  scoring 
only  one  or  two  soldiers.  Such  cases  are  not  very  illuminating.  To  avoid 
them,  only  tasks  for  which  degrees  of  freedom  for  scorers-within-post  was  at 
least  5  were  examined.  Forty  tasks  met  this  criterion  (degrees  of  freedom 
ranged  from  5  to  23).  For  these  tasks,  the  mean  scorers-within-post  effect 
accounted  for  4.6X  of  the  hands-on  variance.  This  number  probably 
underestimates  the  size  of  the  scorer  effect  because  post  effects  were  still 
confounded  by  scorer  effects.  That  is,  for  all  but  a  few  tasks  in  this 
analysis,  several  posts  were  represented  by  only  one  scorer.  For  the 
thirteen  tasks  with  10  or  more  degrees  of  freedom  for  scorers  within  post 
(and  fewer  posts  with  only  one  scorer),  6.4%  of  the  hands-on  variance  is 
associated  with  scorer  differences.  While  it  is  not  possible  to  totally 
disentangle  post  versus  scorer  differences,  it  is  probably  safe  to  conclude 
that  there  were  consistent  scorer  differences,  and  that  some  of  the 
differences  among  posts  are  attributable  to  scorer  differences. 

These  analyses  unfortunately  are  like  trying  to  show  that  something  does 
not  exist  when  we  can  look  in  only  so  many  places.  That  is,  we  are  trying  to 


65 


rule  out  alternative  explanations  for  the  post  effects  while  we  are  limited 
in  the  availability  of  ways  to  look.  Given  the  evidence*  unwanted  post 
effects  at  the  task  level  can  not  be  ruled  out*  and  the  standardization  of 
hands-on  test  means  by  post  appears  justified. 

One  may  wonder  what  might  be  the  negative  consequences  if  the  decision 
to  standardize  by  post  is  incorrect.  The  most  damaging  consequence  would  be 
an  introduction  of  error  leading  to  a  reduction  in  the  predictability  of 
hands-on  measures.  To  shed  some  light  on  this  possibility,  the 
predictability  of  standardized  and  unstandardized  hands-on  test  scores  were 
compared  using  the  reduced  model  variables  (i.e.  R^s  for  predicting  hands-on 
tests  from  written  tests,  experience*  practice,  time,  and  ability).  Across 
the  147*  the  average  difference  between  the  two  is  .02  with  the 
standardized  hands-on  scores  being  slightly  less  predictable.  The  standard 
deviation  of  the  difference  across  the  147  tasks  is  .05.  Thus,  across  the 
tasks  standardizing  has  little  effect  one  way  or  the  other  on  the 
predictability  of  the  hands-on  scores. 


Suninary 

In  summary,  post  effects  on  hands-on  scores  were  present  and  no 
alternative  explanation  of  those  effects  was  found.  This  leaves  the 
implication  that  the  post  differences  reflect  error  in  the  measurement 
process..  Second,  the  post  effects  seem  to  be  operating  idiosyncratically  at 
the  task  level,  i.e.,  as  the  post  or  scorer  effects  unique  to  each  task, 
rather  than  as  the  post  level  effects  consistent  for  all  tasks  in  an  MOS. 
Third,  while  it  is  not  possible  to  totally  disentangle  post  and  scorer,  some 
of  the  between  post  differences  are  probably  due  to  scorer  differences. 
Fourth,  post  differences  should  be  controlled  in  further  statistical  analyses 
of  hands-on  test  scores.  And  finally,  even  if  this  conclusion  is  incorrect, 
statistical  corrected  by  standardizing  by  post  will  not  have  a  grave  impact 
on  the  predictability  of  the  hands-on  scores. 


References 


Campbell,  C.  H.  (1986).  Developing  basic  criterion  scores  for  hands-on 
tests,  job  knowledge" tests,  and  task  rating  scales  (In  preparation). 
Alexandria*  VA:  Human  Resources  Research  Organizat ion. 

Campbell,  C.  H. *  Campbell*  R.  C.*  Rumsey*  N.  G.,  and  Edwards*  0.  C.  (1985). 
Development  and  field  test  of  tosk-based  MOS-specific  criterion  measures 
(Aftl  Technical  Report  ^1^).  Alexandria,  VA:  U.S.  Army  Research 
Institute  for  the  Behavioral  and  Social  Sciences. 

Haier*  M.  H.  (1983).  Using  job  performance  tests  as  criteria  for  validating 
qualifications  Standards  (Memorandum  CNA  83-3123.09).  Alexandria,  VA: 
Center  for  Naval  Analyses. 

This  research  was  funded  by  the  U.S.  Array  Research  Institute  for  the 
Behavioral  and  Social  Sciences,  Contract  No.  MDA903-82-C-0531.  All 
statements  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  express  the  official  opinions  or  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


66 


ESTIMATES  OF  TASK  PARAMETERS  FOR 
TEST  AND  TRAINING  DEVELOPMENT 


R.  Gene  Hoff nan 
Patrick  Ford 

Hunan  Resources  Research  Organization 


Presented  on  Session,  "Issues  in  Job  Analysis” 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Amy 
Research  Institute  or  the  Department  of  the  Amy. 


67 


Estimates  of  Task  Parameters 
for  Test  and  Training  Development 

R.  Gene  Hoffman  and  Patrick  Ford 
Human  Resources  Research  Organization 


The  Array’s  Project  A  Is  a  large  scale  effort  to  validate  the  ASVAB  and  a 
battery  of  new  selection  and  classification  tests  for  enlisted  soldiers.  The 
effort  requires  comprehensive  job  performance  measures  as  validation 
criteria.  In  the  early  stages  of  the  project  the  domains  of  nine  selected 
MOS  were  described  to  allow  the  selection  of  performance  variables  which 
could  be  translated  Into  reliable  and  representative  samples  of  those 
performance  domains.  The  problem  was  to  narrow  down  large  domains.  The 
problem  is  a  familiar  one  in  the  military  context  in  both  the  testing  and 
training  arenas.  That  Is,  job  analyses  have  already  been  conducted  and 
doctrinal  directive  written  which  specify  at  great  length  the  tasks  which 
soldiers  in  each  NOS  are  supposed  to  be  able  to  perform.  Far  too  many  tasks 
are  designated  as  part  of  the  job  than  any  particular  training  or  testing 
program  can  cover. 

To  reduce  the  task  domains  for  Project  A,  five  task  parameters  were 
identified  as  potentially  significant  for  the  selection  of  sets  of 
representative  tasks.  These  include  (1)  the  relative  importance  among  the 
tasks,  (2)  the  similarities  among  the  tasks,  (3)  the  performance  frequency  of 
each  task,  (4)  the  difficulty  of  each  task,  and  (5)  the  variability  in 
performance  for  each  task.  Details  concerning  all  of  these  parameters  and 
how  they  were  used  in  task  selection  is  reported  elsewhere  (HumRRO  &  AIR, 
1984)  and  will  not  be  repeated.  Our  focus  is  retrospective.  Performance 
measures  have  been  constructed  and  administered  to  approximately  400  to  650 
soldiers  in  each  of  the  nine  MOS.  This  provides  the  opportunity  to  examine 
the  validity  of  the  task  selection  data  for  three  of  the  task  parameters: 

(1)  task  difficulty,  (2)  task  variability,  and  (3)  task  frequency. 


Data  Base 


The  "population"  for  this  analysis  is  tasks  rather  than  people,  and  the 
sample  is  the  overlap  between  the  set  of  tasks  for  which  hands-on  performance 
tests  were  administered  during  Project  A's  concurrent  validation  phase  and 
the  AOSP  task  list  as  refined  for  task  selection  uses  (Campbell,  et  al., 
1985).  Some  adjustments  were  necessary  because  equipment  variation 
necessitated  the  use  of  alternative  test  forms  whereas  AOSP  statements  were 
equipment  generic.  Thus,  135  tasks  spanning  the  nine  MOS  were  included  in 
-the  analysis. 

Difficulty  and  variability  task  parameters  were  estimated  during  task 
selection  using  a  single  rating  scale.  For  each  AOSP  task  within  their 
respective  MOS,  subject  matter  experts  (SME;  Ns  ranged  from  10  to  26  for  the 
nine  MOS)  were  asked  to  describe  the  performance  distribution  of  soldiers. 
They  were  asked  to  indicate:  "Out  of  10  soldiers,  how  many  can  do  the  task: 
(1)  All  of  the  time?,  (2)  Most  of  the  time?,  (3)  About  half  of  the  time?,  (4) 
less  than  half  of  the  time?,  or  (5)  Never?"  SMEs  were  also  given  an  escape 
option  of  "Not  observed."  Each  set  of  SME  responses  therefore  represented  a 


C8 


frequency  distribution  of  task  performance.  By  assigning  performance  values 
(1  to  5)  to  the  response  intervals,  a  performance  mean  and  standard  deviation 
was  computed  for  each  task  for  each  SME.  For  each  task,  these  individual  SME 
means  and  standard  deviations  were  averaged  across  SME,  excluding  SME  who 
responded  with  "not  observed."  Thus,  the  average  SME  mean  and  average  SME 
standard  deviation  became  the  difficulty  and  variability  parameters  used  in 
the  task  selection  process.  Interrater  reliabilities  within  each  MOS  were  in 
the  .70s  and  .80s  for  task  difficulty  and  in  the  .50s  and  .60s  for  task 
variability  for  the  nine  MOS.  SME  (generally  E-6  to  E-7)  rated  approximately 
150  to  300  tasks  within  their  MOS.  Further  details  are  presented  in  HumRRO 
and  AIR  (1984). 

Task  frequency  data  used  in  task  selection  were  taken  directly  from  the 
AOSP  survey  results  for  skill  level  one  soldiers.  The  specific  index  was  the 
percent  of  soldiers  reporting  that  they  performed  each  task. 

On  the  criterion  side  of  this  validation,  actual  test  statistics  from 
the  concurrent  validation  data  collection  provide  task  difficulty  and 
variability  estimates.  Performance  on  these  tasks  was  assessed  using  four 
inodes:  (1)  hands-on  tests,  (2)  written  tests,  (3)  peer  ratings  and 
(4)  supervisor  ratings.  Means  and  standard  deviations  for  all  four 
measurement  modes  were  used  as  criteria  against  which  SME  derived  estimates 
were  compared.  Hands-on  and  written  test  scores  were  percent  correct  for 
either  steps  or  items.  Performance  ratings  were  given  by  both  peers  and 
supervisors  on  a  7-point  scale  ranging  from  "among  the  very  worst"  to  "among 
the  very  best"  at  the  end  points  with  "about  the  same  as  others"  at  the 
midpoint. 

Project  A  concurrent  validation  also  included  a  job  history 
questionnaire  completed  by  each  soldier.  For  each  task  in  the  hands-on  test 
sample,  the  questionnaire  asked  soldiers  to  describe  on  a  five  point  scale 
how  recently  they  had  performed  the  task  and  how  frequently  in  the  past  six 
months  they  had  performed  the  task.  These  responses,  averaged  across 
soldiers,  provide  an  independent  assessment  of  task  experience  for  validating 
AOSP  frequency  data. 

Convergence  between  task  selection  data  and  concurrent  validation 
measurement  data  was  assessed  with  simple  correlations.  Correlations 
within  each  MOS  and  across  all  MOS  are  reported. 

For  MOS  level  correlations  for  task  difficulty  and  variability 
estimates.  Ns  range  from  13  to  17  tasks  for  hands-on  and  ratings  measures, 
and  12  to  16  for  written  measures.  Not  all  MOS  had  the  same  number  of 
hands-on  tests  and  for  six  tasks  there  was  no  matching  written  test.  One 
task  had  no  matching  rating.  Across  the  nine  MOS,  the  total  numbers  of  tasks 
were  135  for  correlations  involving  hands-on  data,  129  for  correlations 
involving  written  tests  and  134  for  correlations  involving  ratings.  Since 
AOSP  frequency  data  were  not  available  for  all  tasks,  MOS  level  correlations 
of  task  experience  were  based  on  Ns  which  ranged  from  10  to  15,  with  a  total 
of  108  tasks  across  all  MOS. 


69 


Results 


Table  1  presents  correlations  between  SME  estimates  and  data-based 
estimates  of  task  difficulty.  At  the  MOS  level  the  correlations  fluctuate 
from  -.04  to  .95  and  given  the  small  Ns  on  which  these  correlations  are 
computed  such  large  fluctuations  are  expected.  Confidence  interval  estimates 
depend  on  sample  size*  size  of  the  observed  correlation  and  are  not 
synnetrical .  For  simplicity  however*  it  is  useful  to  use  one  central 
confidence  interval  for  reviewing  a  set  of  correlations.  Thus,  the  95 
percent  confidence  interval*  using  the  lowest  N  (12)  and  an  average  £  near 
.50  is  r  »  -.10  to  £  =  .84  which  is  not  very  different  from  the  range 
obsenreF  in  Table  1.  Across  all  MOS,  SME  ratings  of  task  difficulty  are  more 
predictive  of  rating  means  as  given  by  peer  and  supervisors  than  written  and 
hands-on  test  score  means.  The  .95  confidence  interval  for  total  sample 
correlations  using  the  lowest  N  (129  for  written  tests)  and  an  average 
£  *  .50  is  £  *  .36  to  £  *  .62.  Thus,  the  variation  among  the  correlations  is 
ribt  greater  than  chance. 


Table  1 

Correlations  Across  Tasks  Between 
SME  Means  and  Measurement  Mode  Means 
For  Each  MOS  and  Total  Sample 


MOS 

Hands 

On 

Written 

Peer 

Rating 

Sup. 

Rating 

IIB 

0.50 

0.21 

0.69 

0.80 

13B 

0.92 

0.70 

0.81 

0.82 

19E 

0.54 

0.47 

0.95 

0.93 

31C 

0.58 

0.13 

0.83 

0.86 

63B 

-0.04 

0.07 

0.69 

0.56 

64C 

0.34 

0.51 

0.49 

0.65 

71L 

0.71 

0.66 

0.36 

0.30 

91A 

0.30 

-0.11 

0.65 

0.74 

95B 

0.21 

0.15 

0.29 

0.31 

TOTAL  0.43 

0.33 

0.59 

0.62 

Table  2 

Correlations  Across  Tasks  Between  SME 
Standard  Deviations  and  Measurement 
Mode  Standard  Deviations  For  Each  MOS 
and  Total  Sample 


MOS 

Hands 

On 

Written 

■"Pber  ■■ 
Rating 

Sup. 

Rating 

IIB 

0.62 

0.34 

0.86 

0.68 

13B 

0.75 

0.37 

0.77 

0.77 

19E 

0.51 

0.52 

0.28 

0.17 

31 C 

0.60 

0.28 

0.17 

0.54 

63B 

0.16 

0.14 

0.07 

-0.02 

64C 

0.26 

0.12 

0.87 

0.82 

71L 

0.22 

0.39 

0.70 

0.30 

91A 

0.25 

0.06 

0.18 

0.68 

95B 

0.50 

0.39 

0.33 

0.48 

TOTAL  0.35 

0.26 

0.42 

0.48 

Table  2  presents  the  analogous  correlations  between  SME  estimates  of 
task  variability  and  data  based  estimates.  Again  at  the  MOS  level  the 
correlations  fluctuate  from  -.02  to  .86.  Again,  however  correlations  do  vary 
more  than  expected  by  chance. 

For  reference,  intercorrelations  among  task  means  and  among  task 
standard  deviations  are  presented  in  Tables  4  and  5  in  an  Appendix. 

Table  3  presents  correlations  between  Project  A  frequency  and  recency 
and  AOSP  task  experience  estimates,  as  well  as  correlations  between  an 
unweighted  linear  composite  of  the  frequency  and  recency  with  AOSP  frequency. 


70 


Looking  at  the  composite,  correlations  range  from  .02  to  .90  for  the  within 
MOS  data  (.95  confidence  interval  for  an  average  jr  *  .56  is  r  =  -.10  to 
r  *  .88).  Across  all  MOS,  frequency  and  recency  means  for  tFe  108  tasks  each 
correlate  .46  with  AOSP  frequency  (.M  confidence  interval  is  r  =  .31  to 
r  *  .58).  Frequency  and  recency  means  correlated  .91  with  eacF  other,  so 
That  using  a  composite  of  the  two  does  little  to  strengthen  the  relationship 
between  the  two  sets  of  experience  data. 


Table  3 

Correlations  Across  Tasks  Between  AOSP  Frequencies  and 
Job  History  Responses  for  each  MOS  and  Total  Sample 


MOS 

Composite 

IIB 

0.85 

0.90 

0.88 

13B 

0.55 

0.46 

0.52 

19E 

0.53 

0.43 

0.50 

31C 

0.14 

0.09 

0.13 

63B 

0.00 

0.05 

0.02 

64C 

0.65 

0.81 

0.76 

71L 

0.11 

-0.08 

0.02 

91A 

0.88 

0.92 

0.90 

95B 

0.49 

0.58 

0.53 

TOTAL 

0.46 

0.46 

0.47 

Discussion 


Results  indicate  that,  in  the  absence  of  hard  performance  data,  SME 
estimates  can  provide  reasonably  valid,  though  certainly  not  perfect, 
estimates  of  difficulty  and  variance.  Given  validity  coefficiencies  in  the 
.40  to  .60  range,  SME  estimates  of  task  difficulty  can  be  useful  for  making 
gross  judgments  differentiating  particularly  hard  or  easy  tasks.  In  essence, 
that  was  the  use  made  of  the  SME  difficulty  estimates  during  task  selection 
with  the  very  hard  and  the  very  easy  tasks  generally  not  selected  for 
testing.  Thus,  there  is  some  degree  of  range  restriction  in  the  SME  ratings 
used  in  the  present  analysis  and  the  validity  of  the  SME  estimates  may  be 
understated. 

The  strength  of  the  relationship  between  SME  task  difficulty  and 
performance  rating  means  is  interesting  in  light  of  the  performance  rating 
scale.  Theoretically  the  scale  should  have  led  to  means  near  the  mid-point 
for  every  task,  with  near  zero  variance  across  tasks.  Realistically,  our 
knowledge  of  common  rating  errors  led  us  to  hedge  our  bets  here.  Thus,  we 
analyzed  the  performance  rating  means  expecting. to  find  convergence  with  SME 
means.  Even  though  the  standard  deviation  across  tasks  of  the  rating  means 
were  restricted  to  .28  and  .36  for  peers  and  supervisors,  respectively,  the 
variance  in  task  means  that  did  exist  was  strongly  associated  with  SHE  task 


71 


difficulty  estimates.  Raters  apparently  had  a  hard  time  making  purely 
normative  judgments.  That  is,  raters  may  have  been  reluctant  to  give  average 
or  below  average  ratings  on  tasks  that  almost  all  soldiers  perform  well. 

Validities  for  the  SHE  estimates  of  performance  variability  are  lower. 
Intercorrelations  among  all  estimates  of  task  variability  show  a  similar 
reduction  (compare  Tables  4  and  5  in  the  Appendix).  Thus,  relative 
differences  among  tasks  in  variance  seem  more  affected  by  test  mode  than  do 
their  relative  differences  in  difficulty.  This  makes  SME  estimates  of  task 
performance  variability  less  useful  for  task  selection. 

Project  A  and  AOSP  estimates  of  task  frequency  show  modest  but  perhaps 
more  limited  convergence  than  might  be  expected  from  two  self-reports  of 
essentially  the  same  phenomenon:  participation  in  various  tasks.  There  are, 
however,  several  differences  between  the  two  which  may  have  reduced  their 
convergence.  First,  they  provide  different  experience  indices  (percent  of 
soldiers  who  do  a  task  from  AOSP  data  versus  average  number  of  times  a  task 
is  done  from  Project  A  data)  which  may  have  distorted  the  relative 
distributions  for  tasks  done  as  a  daily  part  of  the  job  type  a  DF  for 

71L  clerks)  versus  tasks  practiced  only  during  set  training  periods  (e.g., 
load,  reduce  a  stoppage  and  clear  an  N16).  Second,  the  surveys  were 
conducted  at  different  times  (several  years  apart  for  some  NOS),  and  any 
instability  over  the  intervening  time  periods  would  reduce  convergence.  This 
was  the  case  for  two  NOS  with  low  experience  convergence  (31C  and  636)  where 
preparation  of  task  tests  was  more  cumbersome  than  other  NOS  because  of  the 
variety  and  continuing  evolution  of  equipment.  Finally,  AOSP  estimates  were 
based  on  a  sample  of  the  entire  first  tour,  while  Project  A  estimates  were 
based  on  soldiers  representing  a  more  limited  range  of  one  to  two  years  time 
in  service.  As  soldiers  increase  in  time  in  service,  their  job  duties  may 
expand  and  change.  The  distinctions  between  the  two  surveys  are  important 
caveats  for  interpreting  either  set  of  experience  data. 


References 


Campbell,  C.  H.,  Campbell,  R.  C.,  Rumsey,  M.  G.,  and  Edwards,  D.  C.  (1985). 
Development  and_fi eld  test  of  task-based  NOS-specific  criterion  measures 
lARI  Technical  Report  717).  Alexandria,  VA:  U.S.  Army  Research 
institute  for  the  Behavioral  and  Social  Sciences. 

Human  Resources  Research  Organization  (HuroRRO)  and  American  Institutes  for 
Research  (AIR)  (1984).  Selecting  job  'tesks  for  criterion  tests  of  NOS 
proficiency  (ARI  Working  Paper  ftS-WP-84-25).  Alexandria,  VA:  U.S.  Army 
Research  Institute  for  the  Behavioral  and  Social  Sciences. 


72 


Appendix 


Table  4 

Intercorrelations  Anong  Neasurenent  Node  Means 


Node 

Hands-On 

Written 

Peer 

Rating 

Supervisor 

Rating 

Hands-On 

1.00 

Written 

0.52 

Peer  Rating 

0.58 

0.40 

1.00 

Supervisor  Rating 

0.53 

0.37 

0.93 

1.00 

Table  5 


Intercorrelations  Among  Measurement  Node  Standard  Deviations 


Node 

Hands-On 

Written 

■"Feer 

Rating 

Supervisor 

Rating 

Hands-On 

Written 

Peer  Rating 

l.do" 

0.40 

0.48 

0.17 

1.00 

Supervisor  Rating 

0.42 

0.20 

0.70 

1.00 

This  research  was  funded  by  the  U.S.  Array  Research  Institute  for  the 
Behavioral  and  Social  Sciences,  Contract  No.  MDA903-82-C-0531.  All 
statements  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  express  the  official  opinions  or  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Array. 


73 


USING  CONFIRMATORY  FACTOR  ANALYSIS 
TO  AID  IN  ASSESSING  TASK  PERFORMANCE 


Jeffrey  J.  McHenry 
American  Institutes  for  Research 

James  H.  Harris 

Human  Resources  Research  Organization 
Scott  M.  Oppler 

American  Institutes  for  Research 


Presented  on  Symposium, 

"Innovations  in  Manpower  Research  Methods: 
Current  Practice  and  Suggestions" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


75 


Using  Confirmatory  Factor  Analysis 
to  Aid  In  Assessing  Tasic  Performance^ 


Jeffrey  J.  McHenry 
American  Institutes 
for  Research 
Washington,  DC 


James  H.  Harris 
Human  Resources  Research 
Organization 
Alexandria,  VA 


Scott  N.  Oppler 

American  Institutes  for  Research 
Washington,  DC 

In  their  landmark  1959  paper,  Campbell  and  Fiske  urged 
psychologists  to  adopt  a  multitralt-multimethod  approach  to  the 
measurement  of  psychological  constructs.  Over  the  past  25  years, 
psychologists  have  applied  Campbell  and  Fiske’s  ideas  to  a  host  of 
assessment  problems. 

The  Campbell  and  Fiske  paper  had  a  profound  Impact  on  the  design 
of  the  U.S.  Army  Research  Institute’s  Project  A.  The  goal  of  Project 
A  Is  to  validate  the  Armed  Services  Vocational  Aptitude  Battery 
(ASVAB)  and  a  set  of  new,  experimental  predictor  tests.  Through  the 
first  four  years  of  Project  A,  we  have  devoted  much  of  our  time  and 
resources  to  the  development  of  reliable,  valid  measures  of  Job 
performance.  The  development  efforts  were  guided  by  our  theory  of  Job 
performance,  which  holds  that  Job  performance  is  multidimensional. 
There  Is  no  single  attribute,  outcome,  or  factor  that  can  be  pointed 
to  and  labeled  as  "Job  performance"  (Campbell  &  Harris,  1985;  Hanser, 
Arabian  &  Wise,  1985).  Consequently,  one  of  the  critical  activities 
In  performance  measurement  is  to  describe  the  basic  factors  that 
comprise  performance.  To  ensure  that  these  factors  were  measured 
adequately,  four  different  types  of  Job  performance  measures  were 
developed:  hands-on  Job  sample  tests,  multiple-choice  knowledge 
tests,  performance  rating  scales,  and  administrative  measures. 

In  a  large-scale  study  of  those  measures,  almost  5000  first-tour 
enlisted  personnel  in  nine  Army  Military  Occupational  Specialties 
(MOS)  participated  In  a  one  and  one-half  day  Job  performance 
assessment  last  summer  and  fall.  Their  data  were  used  to  help  build  a 
model  of  first-tour  enlistee  Job  performance  (Wise,  Campbell,  McHenry 
&  Hanser,  1986). 

In  developing  this  model,  one  of  the  first  things  we  noticed  was 
that  scores  on  the  hands-on  and  written  Job  knowledge  tests  were 
fairly  highly  correlated,  as  were  scores  from  the  rating  scales  and 
administrative  measures.  However,  the  hands-on  and  written  tests  were 
only  moderately  correlated  with  the  performance  ratings  and 
administrative  measures,  suggesting  that  these  different  measurement 
methods  were  tapping  different  portions  of  the  Job  performance  space. 
The  hands-on  and  written  knowledge  tests  were  measuring  "can  do"  or 
maximal  performance,  while  the  rating  scales  and  administrative 


^This  research  was  funded  by  the  Army  Research  Institute  Contract 
No.  M}A-903-82-C-0531.  All  statements  expressed  in  this  paper  are 
those  of  the  authors  and  do  not  necessarily  express  the  official 
opinions  of  the  U.S.  Army  Research  Institute  or  the  Department  of  the 
Army. 


76 


Measures  Mere  assessing  "will  do*  or  typical  performance.  Within  the 
"can  do"  performance  domain,  two  performance  constructs  were 
identified.  The  first.  Core  Technical  Proficiency,  was  comprised  of 
those  performance  components  that  were  specific  to  a  particular  job 
(e.g.,  "typing  correspondence*  for  an  administrative  specialist, 
"driving  a  tank*  for  a  tank  crewman,  etc.).  The  second  construct. 
General  Soldiering  Proficiency,  was  defined  by  common  soldier  tasks 
(e.g.,  navigation,  first  aid,  operating  an  M16).  In  addition  to  these 
two  "can  do"  constructs,  three  "will  do*  constructs  were  also 
identified:  Effort  and  Leadership;  Personal  Discipline;  and  Physical 
Fitness  and  Military  Bearing. 

One  of  the  most  Important  Implications  from  the  Wise  et  a1.  study 
Is  that  researchers  must  be  aware  of  possible  confounds  between  trait 
and  method  when  they  use  a  multi trait-multimethod  approach  to 
assessment.  However,  In  the  Wise  et  a1.  study,  the  performance 
ratings  were  not  designed  to  measure  the  same  traits  as  the  hands-on 
and  written  knowledge  tests.  The  performance  rating  scales  were 
designed  to  measure  broad  dimensions  of  Job  performance,  and  had  been 
developed  using  the  critical  Incident  technique  (Flanagan,  1954).  The 
hands-on  and  written  tests  were  designed  to  measure  performance  of 
critical  tasks.  The  purpose  of  this  paper  Is  to  see  If  similar 
results  are  obtained  when  task- specif 1c  performance  rating  scales  are 
used  Instead  of  rating  scales  developed  from  critical  Incidents. 

Method 


Subjects 

Subjects  were  first-tour  enlisted  soldiers  drawn  from  the 
following  nine  MOS: 

f  Infantryman  (IIB)  (H  >  613) 
e  Cannon  Crewman  (13B)  (H  •  535) 
e  Armor  (Tank)  Crewman  (19E)  (H  •  410) 
f  Radio  Teletype  Operator  (31C)  (li  -  280) 
e  Light  Wheel  Vehicle  Mechanic  (63B)  (^  "  477) 
e  Motor  Transport  Operator  (Truck  Driver)  (64C)  (H  -  527) 
e  Administrative  Specialist  (71L)  (H  -  344) 
e  Medical  Specialist  (91A)  (K  -  410) 

•  Military  Police  (95B)  (Ji  «  588) 

MgaSUIftS 

The  following  three  sets  of  measures  were  administered  to  each 
subject: 

•  Hands-on  performance  tests  on  approximately  15  critical  tasks. 
These  tasks  were  carefully  sampled  from  the  domain  of  Important 
tasks  for  each  job.  Each  hands-on  test  consisted  of  a  number 
of  critical  steps,  with  each  step  scored  GO  or  NO  GO.  The 
number  of  steps  within  a  task  varied  from  as  few  as  six  to  as 
many  as  62.  The  hands-on  task  score  was  the  percent  of  steps 
scored  GO. 

•  Written  job  knowledge  tests  consisting  of  three  to  15  questions 
on  each  of  the  critical  tasks.  The  score  on  each  task  was  the 
percent  of  questions  answered  correctly. 

•  Supervisor  and  peer  ratings  of  performance  on  each  of  the 
critical  tasks.  Each  rater  rated  his/her  assigned  subject’s 
performance  on  each  task  In  terms  of  how  well  the  subject 


77 


performed  the  task  compared  to  other  soldiers.  On  average, 
subjects  were  rated  by  two  supervisors  and  three  peers.  Mean 
supervisor  and  mean  peer  ratings  were  computed  for  each  task. 
These  two  mean  ratings  were  then  averaged  to  compute  the  final 
task  rating. 


Results 


Model  of  Task  Performance 

Campbell  (In  preparation)  has  described  a  model  of  first-tour 
soldier  task  performance  that  was  derived  using  the  data  from  the 
subjects  In  this  study.  Briefly,  the  Intercorrelations  among  the 
within-method  task  scores  were  examined  to  Identify  slml'^arltles 
across  methods  and  across  MOS.  On  this  basis,  five  task  factors  were 
Identified: 

e  Core  Technical.  Included  tasks  that  were  specific  to  the  MOS 
(e.g.,  "typing  correspondence”  for  an  administrative 
specialist,  "driving  a  tank"  for  a  tank  crewman,  etc.). 

e  Communication.  Included  tasks  related  to  operating  a  radio 
set. 

e  Vehicle  Operation  and  Maintenance.  Included  tasks  Involving 
driving  a  vehicle  and  performing  simple  operator  maintenance. 

e  General  Soldiering.  Included  tasks  that  are  critical  to  field 
and  combat  performance,  such  as  weapons  operation  and 
maintenance,  navigation,  etc. 

•  Safety  and  Survival.  Included  tasks  related  to  safety  and 
first  aid.  Including  procedures  for  coping  with  nuclear/ 
biological /chemical  (NBC)  conditions. 

Each  of  the  critical  tasks  was  assigned  to  one  of  the  five  task 
factors.  As  Table  1  shows,  some  of  the  factors  were  not  assessed  for 
some  of  the  MOS.  For  example,  for  Administrative  Specialist  (71L), 
there  were  no  tasks  for  two  of  the  factors:  Communication,  and 
Vehicle  Operation  and  Maintenance.  For  Infantryman  (IIB)  and  Motor 
Transport  Operator  (64C),  the  table  Indicates  that  there  was  no  Core 
Technical  task  factor.  This  Is  because  Communication,  General 
Soldiering,  and  Safety  and  Survival  the  core  technical  part  of  the 
IIB  job,  and  Vehicle  Operation  and  Maintenance  li  the  core  technical 
part  of  the  64C  job. 


Table  1 

Measurement  of  Task  Factors  by  MOS 


Task  Factor 

IIB 

13B 

19E 

31C 

63B 

64C 

71L 

91A 

95B 

Core  Technical 

X 

X 

X 

X 

X 

X 

X 

Communication 

X 

X 

X 

X 

X 

Vehicles 

X 

X 

General  Soldiering 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Safety/Survival 

X 

X 

X 

X 

X 

X 

X 

X 

X 

Analyses 

The  objective  of  this  study  was  to  test  whether  the  observed 


78 


correlations  among  the  hands-on  and  written  knowledge  tests  and  task 
ratings  were  consistent  with  the  Campbell  task  factor  model. 
Confirmatory  factor  analysis  (Joreskog  &  Sorbom,  1981)  was  used  to 
conduct  this  test. 

To  perform  a  confirmatory  factor  analysis,  one  must  first  specify 
a  set  of  latent  constructs  that  explains  the  relationships  among  a  set 
of  observed  variables.  In  the  present  study,  two  sets  of  latent 
constructs  were  hypothesized.  The  first  consisted  of  the  task  factors 
Identified  by  Campbell.  The  second  Included  three  method  factors, 
representing  the  three  measurement  methods  that  were  used  to  assess 
subjects’  performance. 

Each  task  score  was  allowed  to  "load”  on  one  task  factor  and  on 
one  method  factor.  For  example,  we  allowed  the  hands-on  task  score 
for  "typing  correspondence"  for  71L  to  load  the  Core  Technical  task 
factor  and  the  Hands-On  method  factor;  Its  loadings  on  the  remaining 
factors  were  constrained  to  zero. 

We  also  specified  the  relationships  among  the  underlying  factors. 
We  specified  that  the  three  method  factors  were  uncorrelated  with  each 
other  and  with  any  of  the  task  factors.  However,  we  allowed  the  task 
factors  to  be  correlated. 

The  confirmatory  factor  analysis  program,  LISREL,  then  derived 
the  non-zero  loadings  of  the  tasks  on  the  task  and  method  factors  and 
the  correlations  between  the  task  factors.  These  loadings  and 
correlations  were  derived  to  be  as  consistent  as  possible  with  the 
observed  correlations  among  the  task  scores. 

Finally,  LISREL  computed  a  chi-square  Index  to  describe  the  level 
of  agreement  between  the  observed  correlations  and  the  factor  loading 
and  correlations  that  it  has  derived.  Essentially,  LISREL  does  this 
by  working  backwards  and  estimating  the  correlations  from  the  factor 
loadings  and  correlations,  then  comparing  these  estimated  correlations 
to  the  observed  correlations.  A  large  and  significant  chi-square 
value  Indicates  that  the  observed  and  estimated  correlations  differ. 

The  portion  of  Table  2  labeled  "With  Task  Ratings"  shows  results 
from  the  present  study.  The  table  shows  that  the  observed  and 
estimated  correlations  differed  significantly  for  all  nine  MOS. 

Table  2 

Fit  between  the  Task  Factor  Model  and  the  Observed  Correlations 


With  Task  Ratings  Without  Task  Ratings  Change 


MOS 

Chl2 

df 

P 

Chl2 

df 

P 

Chl2 

df 

P 

IIB 

632.6 

492 

.00 

182.6 

206 

.88 

450.0 

286 

.00 

13B 

3250.7 

1218 

788.2 

521 

.00 

2462.5 

697 

.00 

19E 

1033.5 

696 

232.4 

293 

.99 

801.1 

403 

.00 

31C 

1372.5 

935 

.00 

439.8 

395 

.06 

1335.7 

540 

.00 

63B 

1300.5 

942 

440.3 

402 

.09 

860.2 

540 

.00 

64C 

791.5 

492 

mm 

234.9 

206 

.08 

556.6 

286 

.00 

71L 

950.7 

492 

225.2 

206 

.17 

725.5 

286 

.00 

91A 

1910.1 

942 

719.7 

402 

.00 

1190.4 

540 

.00 

95B 

1359.0 

813 

Bl 

414.5 

344 

.01 

944.5 

469 

.00 

79 


We  felt  that  there  were  two  possible  reasons  for  this  result. 

Our  first  hypothesis  was  that  the  model  was  not  appropriate,  and  that 
a  different  set  of  task  factors  would  do  a  better  job  of  explaining 
the  observed  correlations  among  task  scores.  Our  second  hypothesis 
was  that  the  model  was  working  quite  well  for  the  hands-on  and  job 
knowledge  tests,  but  was  not  appropriate  for  the  task  ratings  because 
the  task  ratings  were  not  measuring  ”can  do*  performance.  He  chose  to 
investigate  this  second  hypothesis. 

Marsh  and  Hocevar  (1983)  have  suggested  a  method  for  testing  such 
hypotheses  using  LISREL.  To  Implement  their  suggestion,  we  re-ran 
LISREL  without  the  task  ratings  data  (and  dropping  the  ratings  method 
factor).  According  to  Marsh  and  Hocevar,  one  can  compare  the  chi- 
square  and  degrees  of  freedom  from  the  new  analyses  with  the  chi- 
square  and  degrees  of  freedom  from  the  original  analyses  to  determine 
whether  the  model  fit  the  data  better  after  the  ratings  data  were 
dropped.  The  portion  of  Table  2  labeled  "Change*  shows  that  the 
Improvement  In  fit  was  significant  for  all  nine  MOS.  The  portion 
labeled  "Without  Task  Ratings*  shows  that  the  Campbell  model  was 
consistent  with  the  observed  correlations  for  seven  of  the  nine  MOS. 

Discussion 

The  results  In  Table  2  Indicate  that  the  factor  structure  of  the 
task  rating  scales  Is  different  from  that  of  the  hands-on  and  written 
job  knowledge  tests.  Other  analyses  (not  reported  in  this  paper) 
Indicated  that  the  performance  construct  most  highly  correlated  with 
the  task  rating  scales  was  the  Effort  and  Leadership  "will  do" 
performance  construct. 

The  data  point  to  the  need  to  consider  the  relationship  between 
measurement  methods  and  traits  when  employing  multitralt-multimethod 
techniques  to  assess  Individual  differences.  Even  though  measures 
drawn  from  two  methods  have  the  same  name  (e.g.,  "driving  a  tark”),  it 
Is  no  guarantee  that  they  measure  the  same  underlying  construct. 
Researchers  must  be  guided  by  theory  and  previous  research  In  deciding 
when  It  Is  appropriate  to  expect  that  measures  from  different  methods 
will  be  useful  In  analyzing  a  given  construct. 

Within  the  field  of  performance  measurement,  for  example.  Hunter 
(1983)  has  shown  that  the  relationship  between  cognitive  abilities  and 
supervisory  performance  ratings  is  different  from  the  relationship 
between  cognitive  abilities  and  hands-on  or  written  knowledge  tests. 
Hunter  has  developed  a  theory  to  account  for  the  relationships  among 
different  performance  measures.  His  work,  the  Wise  et  al.  (1986) 
research,  and  this  research  all  suggest  that  one  should  not  expect  a 
one-to-one  correspondence  between  performance  ratings  and  other 
measures  of  job  performance. 

Other  results  from  Project  A  promise  to  shed  additional  light  on 
the  constructs  underlying  different  performance  measures.  For 
example,  preliminary  results  of  Project  A  validity  analyses  (Campbell, 
1986)  Indicate  that  cognitive  ability  tests  are  much  more  highly 
correlated  with  the  "can  do”  performance  constructs  (I.e.,  with  scores 
from  the  hands-on  and  written  knowledge  tests)  than  with  the  "will  do" 
performance  constructs  (i.e.,  with  performance  ratings  and 
administrative  measures).  On  the  other  hand  the  Assessment  of 
Background  and  Life  Experiences  (ABLE)  (Hough,  Barge  &  Kamp,  in 


80 


press),  a  temperament/biodata  questionnaire,  was  a  much  better  predictor  of  “will  do" 
performance  than  "can  do"  performance.  In  feet,  the  validity  of  ABLE  scales  often 
exceeded  the  validity  of  ASVAB  scales  for  predicting  performance  ratings  (Campbell, 
1986). 

Finally,  the  present  study  demonstrates  the  usefulness  of  confirmatory  factor 
analysis  for  testing  theories  about  the  latent  variables  underlying  a  set  of  observed 
scores.  Most  commonly,  researchers  use  confirmatory  factor  analysis  programs  such 
as  LISREL  to  obtain  statistical  tests  of  the  agreement  between  their  theories  and  a  set 
of  observed  data  (Joreskog  &  Sorbom,  1981).  In  this  study,  we  also  used  LISREL  to 
test  two  competing  theories  (Marsh  &  Hocevar,  1983).  As  these  results  demonstrate, 
LISREL  provides  a  powerful  tool  for  improving  the  quality  of  our  theories  and  the 
conclusions  that  we  draw  from  our  data. 

References 

Campbell,  C.  H.  (in  preparation).  DeveloDino  basic  criterion  scores  for  hands-on  tests. 
job  knowledge  tests,  and  task  rating  scales  (ARI-TR  ).  Alexandria,  VA:  U.S. 
Army  Research  Institute. 

Campbell,  D.  T.,  &  Fiske,  D.  W.  (1959).  Convergent  and  discriminant  validation  by  the 
multitrait-multimehod  matrix.  Psychological  Bulletin.  5£,  81-105. 

Campbell,  J.  P.  (1986),  August).  Project  A:  When  the  textbook  goes  operational. 
Paper  presented  at  the  94th  Annual  convention  of  the  American  Psychological 
Association,  Washington,  DC. 

Campbell,  J.  P.  &  Harris,  J.  H.  (1985,  August).  Criterion  reduction  and  combination 
via  a  participative  decision  making  panel.  Paper  presented  at  the  93rd  Annual 
Convention  of  the  American  Psychological  Association,  Los  Angeles. 

Flanagan,  J.  C.  (1954).  The  critical  incident  technique.  Psychological  Bulletin.  51, 
327-358. 

Hanser,  L  M.,Arablan,  J.  M.,  &  Wise,  LL  (1985).  Multidimensional  performance 
measurement.  Proceedings  of  the  27th  Annual  Conference  of  the  Military  Testing 
Association.  San  Diego:  Military  Testing  Association. 

Hough,  L.  M.,  Barge,  B.  N.,  &  Kamp,  J.  D.  (in  press).  Non-cognitive  measures:  Pilot 
testing.  In  N.  G.  Peterson  (Ed.),  Development  and  field  test  of  the  Trial 
Battery  for  Project  A  (ARI-TR  ).  Alexandria,  VA:  U.S.  Army  Research  Institute  for 
the  Behavioral  and  Social  Sciences. 

Hunter,  J.  E.  (1983).  A  causal  analysis  of  cognitive  ability,  job  knowledge,  job 
performance,  and  supervisory  ratings.  In  F.  Landy,  s.  Zedeck,  &  J.  Cleveland 
(Eds.),  Performance  measurement  and  theory.  Hillsdale,  NJ:  Lawrence  Eribaum 
Associates. 

Joreskog,  K.  G.,  &  Sorbom,  D.  (1981).  LISREL  VI  user*s  guide.  Mooresville,  IN: 
Scientific  Software. 

Marsh,  H.  W.,  &  hocevar,  D.  (1983).  Confirmatory  factor  analysis  of  multitrait- 
multimethod  matrices.  Journal  of  Educational  Measurement.  2Q.  231-248. 

Wise.  L  L,  Campbell,  J.  P.,  McHenry,  J.  J.,  &  Hanser,  L.  M.  (1986,  August). 

A  latent  structure  model  of  job  performance  factors.  Paper  presented  at  the  94th 
Annual  Convention  of  the  American  Psychological  Association,  Washington,  DC. 


81 


INFLUENCE  OF  ENVIRONMENT.  ABILITY.  AND  TEMPERAMENT  ON 
PERFORMANCE  IN  ARMY  HOS 


Darlene  M.  Olson 
U.S.  Army  Research  Institute 

Walter  C.  Borman 

Personnel  Decisions  Research  Institute 


Presented  on  Session.  "Organizational  Effectiveness" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic.  Connecticut 

November  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


83 


INFLUENCE  OF  ENVULONHENT,  ABILITY  AND  TEMPERAMENT  ON  PERFORMANCE  IN  ARMY  HOS 


Oftrlenc  M.  Olson  Vslttr  C.  Bonan 

D.S.  Any  Eesssrch  Institute*  Petsonnel  Decisions  Beseerch  Institute 

Job  perfonence  hes  been  conceptusllzed  as  a  product  of  abilities, 
skills,  and  personal  cbaractaristics  that  individuals  bring  to  the  Any,  of 
anvironsMntal  experiences  that  influence  a  soldier  after  anlistnent,  and  of 
the  person's  notivatlon  to  perfon.  Although  a  substantial  portion  of  the 
total  variability  in  perfonance  criteria  can  be  explained  by  individual 
difference  factors,  work  environaent  variables  related  to  support,  training 
opportunities,  and  perceived  Job  Inportance  have  been  found  to  have  weak,  but 
consistently  significant  relationships  with  supervisory  ratings  of  soldier 
effectiveness,  Any-wlde  rating  factors  (e.g..  Personal  Discipline)  and 
■ensures  of  hands-on  task  proficiency  (Olson  &  Boman,  1986). 

The  iapact  of  cognitive  abilities,  teaperaaent,  work  environaent  and 
their  possible  interactive  effects  on  job  perforaance  should  be  investigated. 
Peters  &  O'Connor  (1980)  have  proposed  that  environaental  factors  aay  aoder- 
ate  the  relationships  between  ability  and  perforaance.  In  contrast,  Schaidt 
and  Hunter  (1977)  have  contended  that  the  prediction  of  perforaance  froa 
ability  is  stable  across  situations  and  over  tiae  for  various  jobs.  More 
current  research  (e.g.,  Staw  &  Ross,  1985)  has  found  dispositional  effects 
for  job  satisfaction  criteria.  Hence,  research  suggests  that  both  person  and 
environaent  factors  should  play  a  role  in  explaining  the  variability  in  sol¬ 
dier  perforaance. 

The  aodel  of  soldier  effectiveness  advanced  here  assuaes  that  perfora¬ 
ance  is  influenced  by  a  soldier's  abilities  and  teaperaaent,  which  are  aeas- 
ured  when  entering  the  ailitsry,  and  individual  perceptions  of  the  work 
environaent  developed  through  experience  with  the  Aray  job  setting.  In  this 
context,  the  purpose  of  this  research  was  to  investigate  potential  aoderatlng 
effects  of  work  environaent  diaensions  on  the  relationship  between  individual 
differences  and  job  perforaance  in  four  clusters  of  Aray  jobs. 

Method 

Subjects.  The  saaple  contained  5080  first-tern  Amy  enlisted  personnel 
in  9  different  jobs.  There  were  673  infantryaen,  629  cannon  crewnen,  485 
amor  crewnen,  351  radio  operators,  618  light-wheel  vehicle  aechanics,  659 
■otor  transport  operators,  500  sdnlnlstrative  specialists,  485  aedlcal  spe¬ 
cialists,  and  680  nilitary  police.  These  MOS  were  saapled  at  11  continental 
United  States  and  four  European  Aray  installations.  These  jobs  were  grouped 
into  one  coabat  (IIB,  13B,  and  19E  MOS)  and  three  non-coabat  clusters  (Cleri¬ 
cal  (71L  MOS),  OperaUons  (31C,  63B,  and  64C  MOS),  and  Skilled  Technical  (91A 
and  95B  MOS)].  Previous  eaplrical  research  (Mclaughlin,  et.  al.,  1984)  dea- 
onstrated  that  the  above  clusters  are  sufficient  to  group  Amy  jobs  on  the 
basis  of  aptitudes  aeasursd  by  ASVAB. 

Perforaance  Measures.  Criterion  developaent  work  was  conducted  by  the 
Project  A  contractors  and  included  construction  of  the  following  aeasures:  1) 
Amy-wide  mting  scales  relevant  for  evaluating  soldiers  in  any  first- tour 


^The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not  neces¬ 
sarily  reflect  the  view  of  the  U.S.  Amy  Research  Institute  or  the  Department 
of  the  Amy.  .. 


84 


Amy  joby  2)  job-tpeclfic  rating  scales,  3)  baDd8**on  task  proficiency  aeas* 
urea,  and  4)  job  knowledge  tests.  The  Any-wlde  rating  scales  were  developed 
using  a  variant  of  the  behaviorally-ancbored  rating  scale  aethodology,  and 
aaphaslze  pcrfonance  diaenslons  relevant  to  any  HOS  (e.g.,  aalntalnlng 
equlpaent).  The  job-specific  scales,  which  were  also  7-polnt  behavior  sub- 
Bsry  scales,  focus  on  narrow  perfornance  areas  relevant  to  a  designated  job 
(e.g.,  transporting  personnel  for  the  aotor  transport  operator  job).  The 
hands-on  tests  consisted  of  15  HOS-speclflc  tasks.  Hands-on  scores  were 
coaputed  for  each  soldier  by  averaging  the  proportions  passed  across  the 
tasks  tested.  Multiple  choice  tests  were  developed  to  assess  job  knowledge 
relevant  to  laportant  and  representative  tasks  In  an  HOS.  A  total  job  knowl¬ 
edge  score  for  each  research  participant  was  derived  as  a  percentage  of  the 
nuaber  of  Iteas  answered  correctly.  Factor-analysis  of  the  perforaance  rat¬ 
ings  resulted  In  an  Interpretable  solution:  1)  Effort  and  Leadership  2}  Per¬ 
sonal  Discipline  and  3)  Military  Bearing  (Caapbell,  Hanser,  &  Wise,  1986). 
Factor  scores  for  the  perforaance  ratings,  along  with  an  overall  soldier  ef¬ 
fectiveness  coaposlte  based  on  the  unit  weighting  of  ratings  on  the  Aray-wide 
diaenslons  were  used  In  subsequent  analyses. 

Work  Envlronaent  Measures.  The  Aray  Work  Envlronaent  Questionnaire 
(AWEQ),  a  revised  53  Itea  aultlple  choice  questionnaire  aeasures  the  follow¬ 
ing  Aray  environaental  constructs:  1)  Resources,  2)  Supervisor  Support,  3) 
Training/Opportunities  to  Use  HOS  skills,  4)  Job/Task  laportance,  and  5)  Co¬ 
hesion/Peer  Support.  AWEQ  iteas  are  answered  using  a  5-polnt  frequency  rat¬ 
ing  scale  (e.g.,  1  >■  Very  Seldoa  or  Never  to  5  >  Very  Often  or  Always). 
Respondents  are  asked  to  Indicate  bow  often  each  environaental  situation 
described  In  an  itea  occurs  on  their  present  job.  Iteas  consist  of  state- 
aents  such  as  “You  get  recognition  from  supervisors  for  the  work  you  do” 
(Supervisor  Support).  Five  standardized  unit  weighted  factor  scores  are  de¬ 
rived  for  the  AWEQ. 

Cognitive  Ability.  A  composite  measure  of  four  subtests  from  the  Araed 
Services  Vocational  Aptitude  Battery  (ASVAB),  known  as  the  Araed  Services 
Qualifications  Test  (AFQT) ,  was  used  as  an  assessaent  of  general  cognitive 
abilities. 

Teaperaaent  Mysures.  The  Assessaent  of  Background  and  Life  Experiences 
(ABLE)  inventory  (Peterson,  Hough,  Ashworth,  &  Toquaa,  1986),  which  Includes 
ten  teapera-aent/blodata  scales,  was  adalnistered  as  a  self-report  aeasure  of 
soldier  teaperaaent.  From  factor  analysis  of  the  ABLE  a  three  factor  solu¬ 
tion  emerged:  1)  Acbleveaent,  2)  Dependability  and  3)  Adjustaent.  The 
Achievement  factor  has  iteas  loading  from  the  Self-Esteem,  Work  Orientation, 
Doalnance  and  Energy-Level  scales.  The  Dependability  factor  contains  Iteas 
from  the  Non-delinquency,  Traditional  Values,  Conscientiousness,  Cooperative¬ 
ness,  and  Internal  Control  scales.  The  Adjustaent  factor  has  Items  loading 
froa  the  Eaotlonal  Stability  acale. 

Procedures.  The  rating  acales  were  adalnlatered  to  groupa  of  15  or  fewer 
peers  or  supervisors  of  the  target  ratees  after  they  were  trained  using  a 
coabinatlon  error  and  accuracy  training  program.  During  the  peer  rating  ses- 
alona,  raters  (who  were  In  addition  rateea  and  aeabera  of  the  reaearch  aaa- 
ple)  alao  reaponded  to  the  AWEQ.  The  ABLE  Inventory  waa  adalnlatered  In 
aeparate  aaall  group  aesslons.  Task  proficiency  aeasures  were  adalnlns tered 
to  each  soldier  by  experienced  job  Incuabents  or  supervisors,  who  were 
trained  to  evaluate  and  score  each  hands-on  task.  HOS-speclflc  job  knowledge 
tests  were  given  to  groups  of  15-30  soldiers. 


85 


Results 


Regression  Analyses.  Modern ted  regression  snslysls  urns  used  to  estinste 
the  relstlonships  of  sblllty,  tenpersaent,  perceptions  of  the  work  environ- 
■ent»  end  their  interactions  to  typical  perfonance  ratings  and  aore  objec¬ 
tive  perforaance  criteria.  A  series  of  four  separate  regression  aodels  were 
built  for  each  of  the  four  perforaance  aeasures  nested  in  each  Job  cluster. 
First,  the  separate  perforaance  variables  were  regressed  on  an  individual 
differences  aodel,  which  contained  AFQT  and  three  teaperaaent  factor  scores 
to  deteraine  the  contribution  of  individual  differences  at  the  tiae  of  en- 
llstaent  to  subsequent  Job  perforaance.  Second,  an  environaental  aodel, 
which  contained  the  five  work  envlronaent  constructs  was  used  to  predict  the 
separate  perforaance  aeasures  to  examine  the  aaount  of  variance  explained  by 
these  variables.  Third,  a  full  aodel  containing  both  individual  differences 
and  environaental  factors  was  tested.  Finally,  a  set  of  interactions  aaong 
the  predictors  (ability  X  teaperaaent,  ability  X  envlronaent,  and  teaperaaent 
X  environment  factors)  was  added  to  the  full  model  and  the  separate  perfora¬ 
ance  criteria  were  regressed  on  it  to  deteraine  the  post-enllstaent  interre¬ 
lationships  aaong  envlronaental/organlsatlonal  influences  on  soldier 
perforaance  and  the  expression  of  individual  differences  in  ability  and  tea¬ 
peraaent  on  the  Job. 

The  regression  analyses  are  presented  in  Tsble  1.  In  each  of  the  four 
Job  clusters,  the  highest  multiple  correlations  were  observed  for  the  predic¬ 
tion  of  Job  knowledge,  with  R  ranging  froa  .37  to  .57,  £  <  .05.  Ability 
explained  the  largest  amount  of  variance  in  Job  knowledge  scores.  Generslly, 
the  full  aodel  of  individual  differences  accounted  for  aore  variance  In  the 
perforaance  aeasures  than  was  explained  by  the  environaental  model.  However, 
for  both  the  Operations  and  Skilled  Technical  Job  clusters,  higher  aultlple 
correlations  (Rs  ■  .32  and  .26,  respectively)  were  obtained  for  the  predic¬ 
tion  of  usk  proficiency  froa  environaental  aodels  as  compared  with  the  in¬ 
dividual  differences  aodel  (R  ■  .14  and  .20,  respectively). 

In  the  clerical  and  combat  Jobs,  soldier  ability  and  teaperaaent  charac¬ 
terized  by  Dependability  accounted  for  the  most  variance  in  the  perforaance 
criteria.  For  the  Operations  and  Skilled  Technical  HOS,  the  teaperaaent  fac¬ 
tors  (particularly  Dependability  and  Achleveaent)  explained  significant 
variability  in  the  rating  aeasures,  and  soldier  ability  tended  to  account  for 
significant  variance  In  the  aore  objective  performance  measures.  The  envi¬ 
ronaental  aodel  accounted  for  3-lOZ  of  the  variability  in  criterion  aeasures 
for  the  separate  Job  clusters.  The  largest  standardized  regression  coeffi¬ 
cients  were  observed  for  the  prediction  of  ratings  froa  Supervisor  Support 
and  Job/Task  laportance  factors.  Training  bad  a  strong  main  effect  for  the 
prediction  of  task  proficiency  and  Job  knowledge  aeasures  for  the  HOS  clus¬ 
ters.  Further,  those  variables  with  the  largest  standardized  beta  coeffi¬ 
cients  in  the  separate  individual  differences  and  environmental  aodels  were 
retained  in  the  full  model  of  aain  effects  for  the  clusters. 

Table  2  shows  that  ability  X  Job/Task  laportance  interaction  effects 
tended  to  be  significant  across  HOS  clusters  (except  for  Operations)  snd 
perforaance  aeasures  (except  for  hands-on).  The  asjority  of  intersctlon 
effects  were  concentrated  between  individual  differences  relsted  to  soldier 
teaperaaent  and  work  envlronaent  constructs.  Specifically,  teaperaaent  fac¬ 
tors  related  to  Dependability  and  Adjustment  interacted  with  soldier  percep¬ 
tions  of  Job/Task  Importance,  level  of  Supervisor  Support,  and  available 


86 


organlza clonal  Baaourcat.  Fawar  algnlf leant  ioteracclona  vara  obsarved  ba- 
twaen  cognltlva  ability  (AFQT)  and  taaparaaent  In  tha  pradlcclon  of  job  par* 
focaance.  Training  X  Achlaveaent  and  Cobaalon/Peer  Support  X  Adjustaent 
Incaractlons  algnlflcantly  pradlctad  taak  proficiency  In  Coabat  and  Opera¬ 
tions  cluscars  raapactlvaly.  Further,  for  the  Operations  jobs,  several  sig¬ 
nificant  Interaction  affects  battraan  soldier  perceptions  of  Resources  and 
individual  differences  were  found  to  predict  aaxlaal  perforaance  criteria. 

Generally,  when  designated  Interactions  are  added  to  the  full  aodel  of 
aaln  effects,  only  about  IZ  of  the  variance  in  perforaance  beyond  that  ex¬ 
plained  by  aaln  effects  can  be  attributed  to  Interactions.  However,  for  the 
Clerical  HOS,  Interaction  effects  accounted  for  an  additional  3-7Z  of  the 
variability  In  soldier  perforaance,  with  higher  percentages  of  explained 
variance  associated  with  tha  aora  objective  perforaance  criteria. 

Discussion 

This  research  exaalned  relationships  aaong  Individual  differences  in 
ability  and  taaperaaent,  perceptions  of  the  Aray  work  anvlronaent,  and  the 
perforaance  of  first  tera  enlisted  personnel.  Findings  revealed  that  indi¬ 
vidual  differences  and  anvironaental  perceptions  have  Independent  effects  on 
perforaance  in  the  four  job  clusters.  Soae  differential  effects  were  found 
across  job  clusters  with  aaxiaal  perforaance  (e.g.,  job  knowledge  and  task 
proficiency)  predicted  best  froa  cognitive  ability  (AFQT)  In  the  Clerical  and 
Coabat  jobs. 

Significant  effects  for  the  work  envlronaent  Indicate  thet  both  types  of 
typical  perforaance  ratings  are  predicted  froa  the  aore  cllaate-orlented 
constructs  of  Supervisor  Support  and  Job/Task  laportance;  particularly  in  the 


8C 


COBbat  and  Operations  clusters.  In  contrast,  soldiers'  perceptions  of  Train¬ 
ing  and  tbeir  opportunities  to  utilize  MOS  skills,  as  well  as  the  availabil¬ 
ity  of  Basources  (e.g. ,  tools  and  equlpaent)  tended  to  predict  both  job 
knowledge  and  task  proficiency  aeasurea  for  all  job  groups.  Interaction 
results  show  that  both  taaperaBent  and  work  environaent  factors  aoderate  the 
relationship  between  ability  and  perforaance.  In  addition,  work  environaent 
factors  related  prlaarlly  to  Supervisor  Support,  Basources,  and  Job/Task 
laportance,  and  to  a  Kiser  extent  Training  tended  to  aoderate  the  relation¬ 
ships  between  Individual  taaperaaent  factors  and  perforaance. 

These  findings  tentatively  Indicate  that  job  perforaance  Is  Influenced 
not  only  by  Individual  differences  In  ability,  but  also  by  the  dispositions 
that  soldiers  bring  to  the  Aray  and  their  perceptions  of  the  envlronaental 
context  encountered  after  enllstaent,  regardless  of  how  jobs  are  clustered. 
Further,  findings  suggest  that  pre-enlistaent  differences  aaong  soldiers  in 
ability  and  taaperaaent  interact  with  their  envlronaental  perceptions  in  the 
prediction  of  various  perforaance  outcoaes.  Considerable  variance  In  soldier 
perforaance  can  be  attributed  to  the  aain  effects  of  Individual  differences 
and  environmental  perceptions,  and  generally  significant  Interactions  aaong 
these  factors  explain  little  aeanlngful  variance. 

Beferences 

Gaapbell,  J.,  Hanser,  L.,  &  Wise,  L.  (1986,  Hoveaber).  The  developaent  of  a 
ydel  of  Project  A  criterion  space.  Paper  presented  at  the  28th  Annual 
Conference  of  the  Military  Testing  Association,  Mystic,  Connecticut. 
McLaughlin,  D.  U. ,  Bossaeissl,  P.  G. ,  Vise,  L.  L. ,  Brandt,  D.  A.,  6  Uang, 

M.  (1984).  Validation  of  current  araed  services  vocational  aptitude  bat¬ 
tery  (ASVAB)  coaposites.  (Technical  Beport  Mo.  651).  Alexandria,  VA: 

U.S.  Aray  Research  Institute  for  the  Behavioral  and  Social  Sciences. 

Olson,  D.  M.,  &  Boraan,  V.  C.  (1986).  Developaent  and  field  tests  of  the 
Aray  Work  Environaent  Questionnaire  (Working  Paper  RS-VP-86-06 ) . 
Alexandria,  VA:  U.S.  Amy  Research  Institute  for  the  Behavioral  and  So¬ 
cial  Sciences. 

Peters,  L.  H.,  6  O'Connor,  E.  J.  (1980).  Situational  and  work  outcoaes:  The 
influences  of  a  frequently  overlooked  construct.  Acadeay  of  Manageaent 
Review,  391-397. 

Peterson,  nT,  Hough,  L. ,  Ashworth,  S.,  6  Toquaa,  J.  (1986,  Hoveaber).  New 
predictors  of  soldier  perforaance.  Paper  presented  at  the  28tb  Annual 
Conference  of  the  Military  Testing  Association,  Mystic,  Connecticut. 
Schaidt,  F.  L. ,  6  Hunter,  J.  E.  (1977).  Developaent  of  a  general  aolution  to 
the  problea  of  validity  generalization.  Journal  of  Applied  Psychology, 

M,  529-540. 

Staw,  B1  M.,  6  Roas,  J.  (1985).  Stability  in  the  aldst  of  change:  A 

dispositional  approach  to  job  attitudes.  Journal  of  Applied  Psychology, 

70  O),  469-400.  - ^ - - 


89 


NEW  PREDICTORS  OF  SOLDIER  PERFORMANCE 


Norman  Peterson 
Leaetta  Hough 
Steve  Ashworth 
Jody  Toquam 

Personnel  Decisions  Research  Institute 


Presented  on  Symposium, 

"Project  A  Concurrent  Validation:  Preliminary  Results" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


91 


NEW  PREDICTORS  OF  SOLDIER  PERFORMANCE 


Norman  Peterson,  Leaetta  Hough,  Steve  Ashworth,  and  Jody  Toquam 
Personnel  Decisions  Research  Institute 

Introduction 

New  predictors  of  soldier  performance  have  been  developed  as  part  of 
Project  A.  Previous  papers  presented  to  this  association  have  described 
the  theoretical  approach,  development,  and  pilot  and  field  testing  of  those 
predictors  (Hough,  NcGue,  Kamp,  Houston,  &  Barge,  1985;  McHenry- &  Toquam, 
1985;  Peterson,.  1985;  Rosse  &  Peterson,  1985;  Toquam,  Dunnette,  Corpe,  & 
Houston,  1985).  .Very  briefly,  those  papers  showed  that  a  construct- 
oriented  approach  was  utilized  to  Identify  and  develop  new  measures  that 
would  complement  the  Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  in 
terms  of  abilities  measured  and  likelihood  of  Increasing  the  prediction  of 
training  and  Job  performance.  Both  paper-and-pencll  and  computer- 
administered  measures  were  developed  to  tap  constructs  In  cognitive  (pri¬ 
marily  spatial)  ability,  perceptual/psychomotor,  temperament,  biographical, 
and  vocational  Interest  domains.  Pilot  and  field  testing  results  showed 
the  new  measures  were  psychometrically  sound  and  were  measuring  constructs 
relatively  unique  from  the  ASVAB. 

This  paper  describes  some  of  the  results  of  analyzing  the  properties 
of  the  new  measures,  collectively  called  the  Trial  Battery,  as  exhibited  In 
the  concurrent  validity  sample  of  Project  A.  This  sample  consisted  of  over 
9,000  active  duty  soldiers  In  their  first  three  years  of  service,  from  19 
different  military  occupational  specialties.  Other  papers  In  this  sym¬ 
posium  provide  more  detailed  descriptions  of  the  data  collection  procedures 
and  Job  performance  criteria  also  collected  from  that  sample  (Harris,  1986; 
Campbell,  Hanser,  &  Wise,  1986). 

New  Predictor  Factor  Scores 

The  Trial  Battery  consisted  of  three  major  types  of  Instruments: 

1)  six  timed  paper-and-pencll  tests  of  cognitive  spatial  ability,  2)  ten 
computer-administered  tests  of  perceptual/psychomotor  ability,  and  9)  three 
untimed  paper-and-pencll  Inventories  measuring  temperament/bi ograph leal 
data  (the  Assessment  of  Background  and  Life  Experiences  or  ABLE),  voca¬ 
tional  Interests  (the  Army  Vocational  Interest  Inventory  or  AVOICE),  and 
Job  reward  preferences  (the  Job  Orientation  Blank  or  JOB);  collectively 
referred  to  as  non-cognitive  Inventories. 

Over  60  separate  scores  are  obtained  from  the  full  Trial  Battery. 

Space  does  not  allow  presentation  here  of  statistics  for  all  these  scores. 
We  used  principal  components  factor  analysis  (varlmax  rotation)  to  Identify 
a  smaller  number  of  factor  scores  for  use  in  validity  analyses.  Examina¬ 
tion  of  these  solutions  led  us  to  choose  19  factor  scores;  these  were 
formed  by  simply  summing  the  scores  that  defined  each  factor,  not  by  using 
a  multiple-regression,  factor-scoring  method.  Therefore,  we  are  here  using 
the  term  factor  to  denote  simply  a  higher-order  organization  of  Trial 
Battery  test  scores,  and  do  not  Intend  these  factors  as  representations  of 
underlying  pyschological  constructs.  These  19  factors  are  simply  a  parsi¬ 
monious  method  of  combining  the  larger  number  of  Individual  scale  scores 
for  purposes  of  validity  analyses  In  a  way  that  Is  faithful  to  their 
covariances.  Table  1  shows  the  names  of  these  factors,  the  number  of 
scores  making  up  the  factor,  the  median  reliability  coefficients  of  the 
scores  entering  each  factor,  and  the  median  uniqueness  estimate  of  the 
factor.  Figure  1  shows  the  names  of  the  scale  scores  that  made  up  each 
factor,  organized  by  type  of  Instrument. 

The  medians  of  the  Internal  consistency  reliability  coefficients  range 


from  .46  to  .93;  mean  •  .78.  All  but  four  are  greater  than  .70.  One  of 
these.  General  Reaction  Accuracy,  Is  the  sum  of  percent  correct  scores  on 
very  simple,  computerized  perceptual  tasks.  These  scores  have,  by  design, 
severely  restricted  variance— we  were  concerned  primarily  with  General 
Reaction  Speed  which  does  have  high  reliability.  The  other  three  factors 
with  relatively  low  Internal  consistency  reliability  are  from  the  Job 
Orientation  Blank,  especially  the  Routine  Work  and  Job  Autonomy  factors. 
These  are  really  just  single  scale  scores,  with  only  three  or  four  Items  on 
each  scale,  which  probably  accounts  for  the  low  values. 

The  test-retest  reliabilities  range  from  .13  to  .85;  mean  -  .67.  The 
paper-and-pencll  measures  all  have  reliabilities  of  .70  or  greater,  with 
the  exception  of  Food  Service  Interests  which  Is  .66.  The  reliabilities  of 
the  computer-administered  measures,  however,  are  between  .46  and  .62, 
except  for  the  .13  value  for  General  Reaction  Accuracy  which  we  discussed 
above.  Although  these  values  are  not  as  high  as  we  would  like,  keep  in 
mind  that  these  computerized  tests  are  all  relatively  short  (all  ten  tests 
are  administered  In  about  one  hour).  Measures  that  prove  most  valid  could 
be  lengthened  to  Increase  reliability.  Also,  we  point  out  that  these  are 
retest  Intervals  of  two  to  four  weeks;  test-retest  coefficients  reported  for 
computerized  tests  are  often  same-day  or  next-day  Intervals  which,  of 
course,  would  yield  much  higher  coefficients. 

The  uniqueness  coefficients  In  Table  1  are  Indexes  of  the  amount  of 
reliable  variance  that  does  not  overlap  with,  or  Is  unique  from,  other 
measures--1n  this  case,  the  ASVAB.  The  higher  this  Index,  the  greater  the 
opportunity  for  Incremental  validity  (over  ASVAB).  These  values  range 
from  .40  to  .90;  mean  -  .71.  The  Trial  Battery  measures,  as  a  whole,  do 
appear  to  have  high  potential  for  Incremental  validity,  especially  for  the 
non-cognitive  measures. 

In  sum.  with  a  few  exceptions,  the  Trial  Battery  factors  appear  reli¬ 
able  and  relatively  unique  based  on  analyses  of  this  large,  concurrent 
validity  sample.  We  add  that  these  results  are  highly  similar  to  those 
reported  a  year  ago  on  a  much  smaller  sample  (about  200). 

0  Prediction  of  Job  Performance 

Table  2  shows  results  of  Initial  analyses  of  the  validity  of  new 
predictors  for  predicting  job  performance  and  Table  3  shows  results  of 
Initial  analyses  of  the  Trial  Battery’s  Incremental  validity  (over  ASVAB) 
for  predicting  job  performance. 

There  are  five  criterion  factors  shown  In  both  tables.  The  first  two 
represent  "can  do"  factors  and  are  made  up  largely  of  hands-on  and  written 
job  knowledge  test  scores  (labeled  Core  Technical  Proficiency  and  General 
Soldiering  Proficiency).  The  last  three  represent  "will  do"  factors  and 
are  made  up  largely  of  peer  and  supervisor  ratings  on  behavioral ly-anchored 
rating  scales  and  self-reported  administrative  actions,  such  as  awards  and 
Articles  15  (labeled  Effort  and  Leadership;  Personal  Discipline;  and. 
Physical  Fitness  and  Military  Bearing).  As  earlier  stated,  Campbell,  et 
al.  (1986)  report  In  more  detail  the  development  of  these  criteria. 

Six  predictor  composites  are  shown  In  Table  2,  one  made  up  of  four 
factors  derived  from  the  ASVAB;  the  other  five  made  up  from  the  Trial 
Battery  factor  scores,  combined  within  Instrument  type.  The  composites 
were  formed  via  multiple  regression. 

Several  things  are  noteworthy  about  Table  2.  First,  It  shows  the 
ASVAB  does  an  excellent  job  of  predicting  the  "can  do"  criteria,  a  moder¬ 
ately  good  job  for  one  of  the  "will  do"  factors  (Effort),  and  not  very  well 
for  two  of  the  "will  do"  factors.  Second,  It  shows  that  the  Spatial  and 


93 


Perceptual/Psychomotor  composites  from  the  Trial  Battery  follow  a  pattern 
similar  to  the  ASVAB,  but  do  not  outpredict  the  ASVAB.  We  point  out  that 
the  perceptual /psychomotor,  computer-administered  battery  requires  about 
60-75  minutes  to  administer,  but  yields  validities  of  .49  and  .56  for  the  , 
"can  do”  criteria.  Also,  the  six  spatial  tests  require  about  90  minutes  to 
administer,  and  do  nearly  as  well  as  the  ASVAB.  Finally,  the  non-cognitive 
portions  of  the  Trial  Battery  do  only  moderately  well  at  predicting  the 
"can  do"  criteria,  but  the  ABLE  equals  or  outperforms  the  ASVAB  and  the 
cognitive/perceptual/psychomotor  portions  of  the  Trial  Battery  for  pre¬ 
dicting  the  "will  do"  criteria.  Indeed,  the  ABLE  is  13  and  16  points 
higher  than  the  ASVAB  for  the  Discipline  and  Fitness/Bearing  criteria.  All 
In  all,  the  overall  pattern  of  the  findings  in  Table  2  is  about  what  we 
expected. 


6«ch  Factor 


Madiai 

Muter  tefficifm 


• 

of 

Intsfral 

Tast- 

Nadiat  . 

Caroite 

Seeres 

Orsistaney 

Retest 

bhioaness^ 

(Karall  Spatial 

6 

43* 

.70 

45 

^^fChuiutur 

8 

40 

42 

.71 

Paroepool  SpeaeVAaarae)r 

6 

40 

47 

.72 

Muter  SpeacUAoarac/ 

8 

.91 

.58 

47 

Gnral  Raactioi  Speod 

2 

.95 

46 

.90 

Gnral  Rnction  Aconqr 

2 

42 

.13 

.45 

AchiOMient 

5 

42 

.78 

41 

Oapendebilit/ 

2 

.77 

.77 

.74 

Adjuatate 

1 

41 

.74 

.79 

fhyaical  Condition 

1 

M 

45 

43 

Ski  Had  Tadnician  Interaata 

7  ’ 

49 

.75 

42 

Stnrnn/Mehinea  Intciasts 

4 

.92 

41 

.75 

Ootet'Ralatad  Intarasts 

3 

.90 

40 

.75 

Aulicviaal  Arts  Intarasts 

3 

43 

.74 

41 

tad  Sarvice  Intarasts 

2 

41 

46 

.78 

Proteetiw  Sarvioa  Intarasts 

2 

43 

.76 

41 

Organization/Co-Uorkar  Sifport 

4 

47 

m 

45 

Routina  UOrfc 

1 

.46 

H/A 

.40 

Jdb  Autonaqr 

1 

40 

N/A 

.47 

Mott:  M  Mri«s,  but  all  >  7,000 

^  IlNM  ara  odtf-aMn  eoafficftea,  eorraetad  with  teanMn  IreMi  preotera,  or  ooefficite  Al|te  for  inteml 
eoraiatn/  vd  oorralatiani  amr  a  tuo-four  week  intanal,  M-47D,  for  taat*retaBt. 

^  lM(|jnoa  ■  t  •  i?,  ttera  R  ■  intamal  oniatwvir  rtliMbility  aatteta  wd  R^  ■  ate*^  «jltipla  oorrala* 
tio)  of  all  ASUW  taats  with  aach  raw  pradictor. 

^  Ihia  ia  baaad  <n  a  atentalrtfoad,  iplit-half  ooaffici««  oollactad  drirg  pilot  ttatirg,  M  ■  118,  becasa 
•am  of  tfMoa  taata  ara  apmded,  aakino  odl-««an  ooeff iciants  iratpropriate. 


94 


non  PAPER-MO-PEMCIL  TESTS 

Ovtrall  Spatial 
Assanbling  Objtcts  Tast 
Nap  Tast 
Haza  Tast 

Objact  Rotation  Tost 
Orlantatlon  Tast 
Figural  Raasoning  Tast 

FROM  COMPUTERIZED  MEASURES 

Psychoaotor 

Cannon  Shoot  Tast  (Tlaa  Scora) 
Targat  Shoot  Tost  (Tlaa  To  Firo) 
Targat  Shoot  Tast  (Log  Olstanca) 
Targat  Tracking  1  (Log  Olstanca) 
Targat  Tracking  2  (Log  Olstanca) 
Poolad  Naan  Hovaaont  TIaa 


FROM  N0N-C0«NmVE  (CONTINUED)  i 

Job  Autonoay  (JOB) 

Autonoay 

Achlavaaant  (ABLE) 

Solf-Esteea  Seal a 
Work  Orlantatlon  Seal a 
Enorgy  Laval  Scala 

Dapandablllty  (ABLE) 
Consclantlousnass  Scala 
Non-Dai inquancy  Scala 

Adjustaant  (ABLE) 

Eaotlonal  Stability  Scala 

Physical  Condition  (ABLE) 
Physical  Condition  Scala 


Pareoptual  Spoad  and  Accuracy 
Short  Tam  Maaory  Tost  (Parcont  Corroct) 

Pareoptual  Spaed  A  Accuracy  Tost  (Decision  Tina) 
Perceptual  Speed  A  Accuracy  Test  (Percent  Correct) 
Target  Identification  Test  (Decision  Tina) 

Target  Identification  Test  (Percent  Correct) 

Nuaber  Spaed  and  Accuracy 
Number  Memory  Test  (Percent  Correct) 

Number  Memory  Test  (Initial  Decision  Time) 

Number  Memory  Test  (Mean  Operations  Decision  Time) 
Nuaber  Memory  Test  (Final  Decision  Time) 

Banaral  Reaction  Spaed 
Choice  Reaction  Time 
Simple  Reaction  Tiae 

fianeral  Reaction  Accuracy 
Choice  Reaction  Percent  Correct 
Slaple  Reaction  Percent  Correct 

FROM  N0N-C06N1TIVE  INVENTORIES 

Organizational  and  Co-Worker  Support  (JOB) 

Job  Pride 

Job  Security  Coafort 
Serving  Others. 

Aabitlon 


Skilled  Technician  Interest  (AVOICE) 
Cl erl cal /Adal n1 strati va 
Medical  Services 
Leadersh 1 p/6u1 danca 
Sclance/Chealcal 
Data  Processing 
Mathematics 

Electronic  Communications 

Structural /Machines  Interest  (AVOICE) 
Mechanics 
Heavy  Construction 
Electronics 

Vahlcle/Equlpaent  Operator 

Coabat  Related  Interest  (AVOICE) 
Coabat 

Rugged  Individualism 
Firearms  Enthusiast 

Audiovisual  Arts  Interest  (AVOICE.) 
Drafting 
Audiographics 
Aesthetics 

Food  Service  Interest  (AVOICE) 

Food  Service  Professional 
Food  Service  Eaployee 


Routine  Work  (JOB) 
Routine 


Protective  Services  Interest  (AVOICE) 
Lau  Enforcement 
Fire  Protection 


Figure  1.  Test  and  Inventory  scale  scores  making  up  Trial  Battery 
Predictor  Factors. 


95 


T«Me2 


iUtiptc  OerretatiOT  of  Six  Indepgtfcnt  Pradicter  QiroUw  with  E«di  of  Fivt  Jcb  Pwfenwtje  Critericn  F«tor>. 

nsiciots 


Mrocptuit/ 


niTERKM  FACTORS 

ASM^ 

Ooaposite 

IC-4 

Spatial 

Abilities 

OoNBsite 

IC«1 

hfachoaator 

Abilities 

OoNxsite 

(Oapuearizad) 

K-5 

JOB 

CoNnite 

{Prefaranoes) 

K«3 

AHE 

Oaposite 

naipamvit/ 

Biodata) 

K-4 

ANDICE 
Oaposite 
(Interests) 
r  >6 

1.  Core  Technical 
Proficiancv 

JO 

M, 

.49 

.26 

Jk 

.35 

2.  General  Soldiering 
Proficiancv 

J6 

M 

.56 

J9 

JS 

J7 

3.  Effort  and 
icHlnhiD 

JS 

JO 

.19 

J4 

.26 

4.  Parsonal  Discipline 

.19 

.16 

.14 

.11 

J2 

.15 

5.  Fh[)«ieal  Fitness  i 
Militarv  Bearino 

.21 

.11 

.11 

J7 

.12 

Note:  Entries  in  the  tMe  are  emraoed  aeroM  9  Amy  KB  with  eeapiett  seta  of  Jeb  hwfcnwree  Oriterim  ai 
Total  saaple  size  is  3902.  S*ple  siaa  rwge  frem  2B1  to  STD;  ndivi  ■  4fi. 

’  HAtiple  Rs  are  adjuetad  for  Nrirtage  mti  corrected  for  reatrictiai  in  rwB*i  but  rot  corrected  for  cri* 
,  terion  ^reliability.  • 

^  K  •  the  ruifaer  of  pradictor  scores  in  the  coipasite. 


Td3le3 

IrtTwgtts  in  Hjltiole  Correlatigg  (Over  R  Uiina  ASVW  Coioosite)  as  A  furticn  of  Adiina  Trial  Battery  factor  Scores  .for 
gTti  nf  rive  Job  Perfomence  Criterion  Factors. 


CRITERICW  fAaORS 


PRBJICTOR 

Core 

Technical 

Proficiency 

General 

Soldiering 

ProUciencY 

Effort  and 
Leadershio 

Personal 

Disciolire 

Fitness  ( 
Bearing 

ASVM^ 

Coiposite 

Alone 

<K-4) 

.60 

.66 

JS 

.19 

.20 

ASiMB  Pits 
Trial  Battery 
Factors 
ac>z5) 

.64 

.TD 

.45 

.42 

...  .y _ 

■w _ 

.10 

.22 

Note:  Entries  in  the  ttble  are  SMcragad  omr  9  Any  KB  with  evplete  sets  of  criterion  aiMrii.  Total  saiple  size  is 
,  3902.  Seiple  sizes  within  KB  range  froa  2B1  to  STD;  eadian  ■  A32. 

'  Multiple  Rs  are  adjusted  for  drirkage  and  corrected  for  restriction  in  rwge,  tz«  not  corrected  for  cri- 
,  terion  urreliability. 

^  K  «  the  ruber  of  predictor  scores  in  the  cenposite. 


9C 


While  the  AVOICE  does  not  show  higher  prediction  than  the  ASVAB  for 
the  "can  do"  criteria,  It  Is  Interesting  that  It  correlates  .33  and  .37 
with  those  criteria.  The  AVOICE  was  Intended  primarily  to  assist  In  clas¬ 
sification  rather  than  prediction  per  se,  so  It  Is  encouraging  to  see  these 
correlations  with  "can  do"  criteria.  Finally,  with  respect  to  Table  2,  we 
note  that  the  JOB,  ABLE,  and  AVOICE  are  expected  to  add  most  to  the  predic¬ 
tion  of  attrition;  those  analyses  have  not  been  done  yet. 


Table  3  shows  a  first,  very  crude  look  at  the  Incremental  validity  of 
the  Trial  Battery.  In  these  analyses,  we  simply  added  all  19  Trial  Battery 
Factor  scores  to  the  ASVAB  factor  scores  and  looked  at  the  Increase  In  the 
mulltiple  correlation.  The  third  row  In  Table  3  shows  that  1)  the  predic¬ 
tion  of  all  five  criteria  Is  Increased,  2)  little  Increase  occurs  for  the 
"can  do"  criteria,  and  3)  sizeable  Increases  occur  for  the  "will  do"  cri¬ 
teria. 


Efforts  are  underway  now  to  make  more  refined  Trial  Battery  composites 
and  to  estimate  the  classification  efficiency  Increments  obtained  via  use 
of  the  Trial  Battery.  These  Initial  results,  however,  show  that  the  new 
predictors  do  1)  predict  soldiers’  job  performance  at  meaningful  levels  In 
the  way  that  was  expected  and  2)  make  meaningful  Increments  over  the  ASVAB 
to  validity  for  Important  aspects  of  soldiers’  Job  performance. 


References 

Campbell ,  J.,  Hanser,  L.,  &  Wise,  L.  (1986).  The  development  of  a  model 
of  Project  A  criterion  space.  Paper  presented  at  the  28th  Annual 
Military  Testing  Association  Conference  Mystic,  Connecticut. 

Harris,  J.  (1986).  The  Project  A  concurrent  validation  data  collection. 
Paper  presented  at  the  28th  Annual  Military  Testing  Association 
Conference,  Mystic,  Connecticut. 

Hough,  L.  M.,  McGue,  M.  K.,  Kamp,  J.  D.,  Houston,  J.  S.,  &  Barge,  B.  N. 
(1985).  Measuring  personal  attributes:  Temperament,  biodata,  and 
interests.  Paper  presented  at  the  27th  Annual  Military  Testing 
Association  Conference,  San  Diego. 

McHenry,  J.,  &  Toquam,  J.  L.  (19851.  Computerized  assessment  of 

perceptual  and  psychomotor  abilities.  Paper  presented  at  the  27th 
Annual  Military  Testing  Association  Conference,  San  Diego. 

Peterson,  N.  G.  (1985).  Happing  predictors  to  criterion  space:  Overview. 
Paper  presented  at  the  27th  Annual  Military  Testing  Association 
Conference,  San  Diego. 

Rosse,  R.  L.,  &  Peterson,  N.  G.  (1985).  Using  microcomputers  for 

assessment:  Practical  problems  and  solutions.  Paper  presented  at  the 
27th  Annual  Military  Testing  Association  Conference,  San  Diego. 

Toquam,  J.  L.,  Dunnette,  M.  D.,  Corpe,  V.  A.,  &  Houston,  J.  S.  (1985). 
Adding  to  the  ASVAB:  Cognitive  paper- and -pencil  measures.  Paper 
presented  at  the  27th  Annual  Military  Testing  Association  Conference 
san  Diego.  ’ 


Note;  This  research  was  funded  by  the  U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences.  Contract  Number  MDA903-82-C-0531. 
All  statements  expressed  In  this  paper  are  those  of  the  authors  and 
do  not  necessarily  reflect  the  official  opinions  or  policies  of  the 
U.S.  Army  Research  Institute  or  the  Department  of  the  Army. 


97 


EFFECT  OF  PRACTICE  ON  SOLDIER  TASK  PERFORMANCE 


Paul  Radtke 
Dorothy  S.  Edwards 

American  Institutes  for  Research 


Presented  on  Symposium, 

"Job  Performance:  What  Do  Soldiers  Know,  What  Can  They  Do?" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

Noveiriier  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


99 


Effect  of  Prectice  on  Soldier  Task  Performance 

Paul  Radtke 
Dorothy  S.  Edwards 

American  Institutes  for  Research 

One  of  the  forms  administered  In  the  Army’s  Selection  and 
Classification  study,  usually  known  as  Project  A,  was  a  Job  History  Ques¬ 
tionnaire.  For  each  of  nine  Military  Occupational  Specialties  (MOSs)  the 
form  listed  all  of  the  tasks  covered  by  paper  and  pencil  knowledge  tests 
and  by  hands-on  performance  tests.  These  tasks  were  selected  from  the 
domain  of  tasks  for  an  NOS  by  a  panel  of  experts  because  they  were  done 
frequently  and  were  important  to  overall  job  performance.  About  thirty 
tasks  were  selected  for  each  MOS;  a11  were  measured  with  performance  based 
knowledge  tests;  about  half  were  also  measured  with  hands-on  tests. 

In  the  Job  History  Questionnaire  soldiers  were  asked  to  Indicate  how 
often  during  the  past  six  months  they  had  performed  each  task,  using  a 
scale  of  "Not  at  all,  1-2  times,  3-5  times,  6-10  times,  or  more  than  10 
times."  Next,  soldiers  Indicated  how  recently  they  had  performed  each 
task,  using  a  scale  of  "Never,  during  past  month,  1-3  months  ago,  4-6 
months  ago,  or  more  than  6  months  ago." 

The  frequency  and  recency  ratings  were  correlated  with  the  scores  on 
the  knowledge  tests  and  with  the  hands-on  tests  for  each  MOS.  The  results 
for  two  sample  MOSs,  one  combat  and  one  support  NOS,  are  shown  In 
Tables  1-2.  The  number  of  cases  for  these  correlations  varies,  but  in 
every  case  Is  substantial.  The  minimum  and  maximum  N  is  given  at  the  top 
to  reduce  the  number  of  columns  in  the  tables.  When  there  Is  a  wide  range 
In  the  number  of  cases  It  reflects  a  smaller  N  on  one  or  two  tests  and 
nearly  maximum  Ns  on  the  others.  The  size  of  the  N  makes  a  rather  small 
correlation  significant  statistically;  the  rather  small  correlations 
probably  have  little  practical  significance.  Note  that  the  recency  corre¬ 
lations  should  be  negative,  because  of  the  way  the  scale  was  written. 

The  tables  have  some  items  of  interest,  however.  Recency  appears  to 
be  more  closely  associated  with  test  performance  than  does  frequency  of 
practice.  In  that  more  of  these  correlations  attain  statistical  signifi¬ 
cance.  Recency  and  frequency  are  correlated,  as  shown  in  the  last  column 
of  the  tables. 

There  Is  a  tendency  for  the  more  complex  tasks  to  be  more  highly 
correlated  with  frequency  and  recency,  though  there  are  some  exceptions  in 
both  directions  --  complex  tasks  not  correlated  or  easy  tasks  correlated. 

Performance  on  MOS-specIfIc  tasks  tends  to  be  more  highly  correlated 
with  frequency  and  recency  of  practice  than  performance  on  the  common 


*This  research  was  funded  by  the  U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences,  Contract  No.  MDA903-82-C-0531.  All 
statements  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  express  the  official  opinions  or  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


100 


TabU  1.  CarrtUtlaa  batMam 
»«at1aaMlr«  Icalw  (f ryycy  «><  •* 

Krforaaiica)  aad  Tatk  Taat  Seorai  for 
111  {Muitrymut) 

Oactml  laUU  aalttad;  •  ■  a1|n1f1caat  at  I  •  .01 


11  I  bMwIaOia 

rarfona  CW  ^ 

Ada  larve  Agant  Antidota 
Put  an  Flald/Pras.  Oratsl^ 
Parfara  OP  Malnt.  an  N16A1 
Load/Oaduea/Claar  N60 
Cngaga  ■  Hand  firanadas 
Prapara  Dragon  for  Firing 
praparo  Oanga  Card  for  H60 
Call  for/Ad|uft  Indiraet  Flro 
Mavigata  an  tha  Omand 
Id  Tarrain  Faaturas  on  NAP 
Put  an  N17  Natk 
Put  an  Protactiva  Clothing 
Callaet/Roport  Info 
Caaonflaga  Salf/Cgulp 
Id  ArMrod  Vahiclos 
Nova  undor  Oiroct  Fira 
Cstlante  tango 
Nova  ovar  Obstaclat 
Oparata  tadio  Sat  M/tK'T7 
InsUll/FIra  Clayanra  Nina 
Tack  of  Urban  Tarr  Novoaant 
Salaet  Haaty  Urban  Firing  Pos 
establish  Obv  Post 
Sal  Fira  Taan^Ovarwatch  Pos 
Zara  AN/PVS-<  to  N16A1 
Placa  AN/PV$-S  Into  operation 

Sat  HoadspaeoAlalng 
Cngaga  Targot  a  LAM 


K  Tests 

NO  Tests 

flMgS-M7) 

IN-496-6K) 

Frag. 

tec. 

Fraq. 

lac. 

04 

-17* 

04 

-13* 

03 

-06 

09 

-09 

06 

-00 

00 

-04 

12* 

-09 

25* 

-20* 

00 

-13* 

OS 

-00 

14* 

-16* 

10* 

-10* 

00 

-10 

24* 

-19* 

16* 

-16* 

10* 

-20* 

10* 

-10* 

02 

-OS 

07 

-00 

02 

-IS* 

-06 

-06 

07 

-07 

15* 

-13* 

00 

-03 

-02 

04 

04 

-01 

09 

-10* 

04 

-02 

07 

-07 

22* 

-19* 

-06 

03 

-03 

-04 

(  -02 

-03 

01 

-00 

02 

-03 

-02 

-02 

-06 

-17* 

>  09 

-04 

37* 

-37* 

41* 

-35* 

-04 

-03 

Table  2 

Fmquancy 

A 

Noconcy 


71 L  KiMwIadga 

Ada  Narva  Agant  Anti -Saif 
load/Claar  NIMl 
Opar  Naint  N16A1 
Oat  Nagnatic  Aziauth 
Oat  end  Coordinates 
Put  on  N17  Nask 
Maintain  N17  Nask 
Put  on  Protactlva  Clothing 
Knoa  tights  as  POM 
Cwovflaga  Salf/Cgulp 
Prae  Nolsa/tlght/llttar  Disc 
File  Oocuaents/Corrasp 
Est  Functional  Fllos 
Control  Suppllos 
toc/Contl  Offica  Equip 
Dispatch  Outgoing  01st. 

Typo  Nllltary  Orders 
Typo  2nd  Coontnt  to  OF 
Typo  Jt  Ntssaga  fans 
Type  a  Naao 

Typo  a  lasle  Coaaont  to  Of 
Assenbla  Corraspondance 
Typo  Nllltary  Latter 
Safeguard  FOOD  Notarial 
t^rans  Classified  Natarlal 
Put  on  Ftald/Prass  Orosslng 
Prop.  tequIsItlon/AUTOOIN 


K  Tests 

NO  Tests 

Frequency 

(IM90- 

SOO) 

(N-494- 

SOO) 

k 

Fraq. 

tac. 

Fraq. 

tac. 

Recency 

-49 

07 

-20* 

-32 

-01 

-01 

-10 

•36 

06 

-06 

02 

-34 

-02 

-00 

-16* 

-40 

00 

-17* 

09 

-34 

-05 

03 

07 

-21* 

-46 

04 

-09 

-45 

10* 

-10* 

-63 

09 

-15* 

-55 

14* 

-12* 

-67 

-16* 

-16* 

-12* 

-to 

10* 

-13* 

10 

-72 

04 

-06 

-08 

07 

-09 

-84 

02 

00 

-79 

11* 

-10* 

-11* 

-77 

01 

-02 

10 

-80 

32* 

-30* 

12* 

-11* 

-79 

15* 

-11* 

00 

-14* 

-76 

14* 

-19* 

00 

-14* 

-80 

20* 

-20* 

20* 

-24* 

-79 

16* 

-13* 

-15* 

-80 

25* 

-24* 

10* 

-84 

03 

-04 

-12* 

-82 

1  10 

-06 

16* 

•02 

-13* 

17* 

-14* 

•/D 

101 


tasks.  It  nay  be  that  common  tasks  have  been  subject  to  more  practice 
during  the  so1d1er*s  enlistment.  This  hypothesis  Is  consistent  with  the 
generally  higher  mean  scores  on  the  common  tasks.  If  true,  the  common 
tasks  may  have  been  ”over1eamed,"  and  thus  less  subject  to  forgetting  or 
to  decrement  through  lack  of  practice. 

Some  common  tasks  were  tested  In  more  than  one  MOS.  This  allows  us 
another  way  to  look  for  consistency  In  association  of  test  scores  and 
frequency  or  recency  of  practice.  Table  3  shows  these  data  for  the  common 
tasks.  One  task,  "Determine  grid  coordinates"  shows  significant  correla¬ 
tions  with  frequency  and  recency  in  six  of  the  seven  MOSs  in  which  the 
knowledge  test  was  given.  It  also  showed  significant  correlations  in  the 
hands-on  tests  in  most  of  the  NOSs  in  which  it  was  given.  It  is  the 
consistency  of  the  findings  rather  than  the  magnitude  of  the  actual  corre¬ 
lations  that  makes  us  believe  that  competency  in  this  task  is  indeed 
related  to  frequency  and  recency  of  practice.  The  test  was  very  similar 
in  both  measurement  methods:  soldiers  had  to  read  grid  coordinates  using  a 
protractor.  They  had  an  advantage  in  the  written  mode  in  that  the  correct 
answer  appeared  as  one  of  four  choices,  whereas  they  had  to  report  the 
coordinates  to  the  test  administrator  in  the  hands-on  mode  without  the 
recognition  advantage  afforded  by  the  multiple  choice  item. 

A  second  test  that  has  a  similar  pattern  of  significant  correlations 
with  the  knowledge  tests  is  "Put  on  and  wear  protective  clothing."  This 
test,  however,  does  not  correlate  with  the  hands-on  measure.  Since  the 
soldier  must  put  on  the  clothing  required  at  four  progressive  levels  of 
protection,  over-dressing  at  phase  1,  or  MOPP  Level  1,  as  it  is  called, 
could  keep  the  soldier  from  correctly  reaching  the  higher  levels. 

Naturally  we  looked  for  characteristics  that  these  two  tasks  have  in 
common  that  are  not  present  in  other  tasks  that  do  not  show  this  pattern 
of  correlations.  We  found  only  one.  Each  of  the  tasks  requires  a  specific 
procedure  that  terminates  in  an  objectively  verifiable  product  or  result. 
Exact  grid  coordinates  are  determined  and  reported,  and  certain  garments 
are  worn  at  each  MOPP  level.  This  means  that  the  "right  answers"  are 
totally  unequivocal  and  readily  observable  by  even  a  careless  scorer  in 
the  hands-on  mode.  These  tests  had  reliability  estimates  that  were  among 
the  highest  in  the  MOSs  in  which  they  appeared,  which  is  probably  also  a 
function  of  the  clarity  and  observability  of  the  response. 

Another  test  that  is  fairly  consistent  in  correlations  with  frequency 
and  recency  is  "Load,  reduce,  and  clear  the  M60  machinegun."  It  was  given 
in  only  three  NOSs,  so  the  consistency  cannot  be  as  pronounced  as  with  the 
grid  coordinates  and  protective  clothing  tests.  Table  4  shows  the  corre¬ 
lations  for  this  task  as  well  as  those  for  a  similar  task:  "Load,  reduce, 
and  clear  the  M16A1  rifle."  Performance  on  the  M16  tests  is  not  as  highly 
correlated  with  frequency,  probably  because  it  is  the  soldier’s  main 
weapon  and  is  more  often  practiced  and  proficiency  is  maintained  at  a  high 
level.  The  task  is  also  somewhat  simpler  than  the  matching  task  on  the  M60 

At  the  bottom  of  Table  4  we  have  shown  the  mean  percent  passing  the 
knowledge  tests  and  the  mean  percent  "GO"  on  the  hands-on  test  for  all 
MOSs  in  which  the  M60  and  M16  tasks  were  covered.  Note  that  performance 
on  the  hands-on  test  is  higher  than  on  the  knowledge  test  for  both  tasks, 


102 


Tahl*  3.  Corrtlatlons  Bttuttn  Job  History  Questionnaire 
Scales  and  Scores  on  Connon  Soldiering  Tasks 
OeclMl  points  omitted:  *-  significant  at  P  -  .01 

A.  Frequency  -  Knowledge  Tests 


K  Tests 

IIB 

138 

19E 

31C 

638 

64C 

71L 

91A 

958 

CPR 

04 

-02 

03 

09 

00 

-11* 

Nerve  agent 

04 

02 

01 

07 

07 

F/P  dressing 

03 

10 

15* 

09 

07 

-03 

10* 

LRC  M16 

-09 

-01 

01 

-02 

-01 

07 

00 

Op/Htn  M16 

06 

-03 

05 

03 

06 

LRC  H60 

12* 

18* 

20* 

Mag.  Azim. 

04 

-02 

12* 

Grid  Coor. 

15* 

16* 

13* 

13* 

08 

18* 

08 

Put  on  mask 

02 

07 

05 

05 

-05 

08 

NOPP 

02 

07 

12* 

09 

16* 

15* 

18* 

14* 

CEOI 

38* 

24* 

fL 

Recency  -  Knowledge  Tests 

CPR 

-17* 

02 

-09 

-10* 

00 

-13* 

Nerve  agent 

-13* 

-12* 

-11* 

-06 

-20* 

F/P  dressing 

-06 

-04 

-12* 

-07 

-08 

-05 

-08 

LRC  N16 

00 

-12* 

-02 

-01 

01 

-16* 

-02 

Op/Ntn  N16 

-08 

00 

-02 

-10* 

-06 

LRC  M60 

-09 

-15* 

-22* 

Mag.  Azim. 

-06 

-08 

-05 

Grid  Coor. 

-12* 

-23* 

-18* 

-18* 

-17* 

-20* 

-04 

Put  on  mask 

-05 

-04 

00 

-04 

03 

-06 

MOPP 

-13* 

-04 

-12* 

-15* 

-11* 

-12* 

-18* 

-12* 

CEOI 

-29* 

-27* 

C.  Frequency 

-  HO  Tests 

CPR 

12* 

15* 

13* 

17* 

Nerve  agent 

01 

11* 

F/P  dressing 

09 

06 

04 

02 

01 

CM 

O 

• 

07 

14* 

LRC  N16 

01 

-03 

-01 

01 

06 

Op/Mtn  Ml 6 

08 

05 

02 

LRC  N60 

25* 

14* 

07 

Mag.  Azim. 

01 

07 

Grid  Coor. 

09 

22* 

13* 

09 

18* 

17* 

Put  on  mask 

07 

04 

-04 

08 

07 

09 

MOPP 

05 

01 

04 

CEOI 

31* 

0.  Recency  - 

-W.TfiSlS 

CPR 

-19* 

-18* 

-09 

-15* 

Nerve  agent 

-01 

-10* 

F/P  dressing 

-09 

-10 

-15* 

-06 

-08 

CM 

O 

• 

-11 

-13* 

LRC  N16 

01 

-07 

-02 

02 

-12* 

Op/Ntn  M16 

-04 

-07 

-10 

LRC  N60 

-20* 

-30* 

-03 

Mag.  Azim. 

-12* 

-02 

Grid  Coor. 

-06 

-21* 

-17* 

-16* 

-19* 

-13* 

Put  on  mask 

00 

o 

1 

00 

-02 

-06 

-21* 

-06 

MOPP 

-06 

-07 

-08 

CEOI 

-27* 

103 


but  the  performance  on  the  Ml 6  ifeapon  Is  superior  to  performance  on  the 
N60.  The  N60  task  Is  somewhat  more  complex,  and  has  more  steps,  but  the 
M16  Is  almost  certainly  practiced  more  often.  Soldiers  do  appear  to  be 
able  to  load,  reduce,  and  clear  their  primary  weapon,  as  Indicated  by  the 
mean  of  85X  60  on  the  hands-on  test. 


Table  4.  Correlations  between  frequency  and 
recency  of  practice  and  test  scores  on  two 
weapons,  the  M60  machinegun  and  the  M16  rifle 


Freouencv 

C'*  »<*''  *<?•'' 

A 

LRC  M60 

K 

12* 

18* 

20* 

LRC  H60 

HO 

25* 

14* 

07 

LRC  M16 

K 

-09 

-01 

01  -02 

-01 

07  00 

LRC  M16 

HO 

01 

-03 

•01  01 

06 

Recency 

LRC  H60 

K 

-09 

-15* 

-22* 

LRC  H60 

HO 

-20* 

-30* 

-03 

LRC  M16 

K 

00 

-12 

-02  -01 

01 

16*  -02 

LRC  MI6 

HO 

01 

-07 

-02  02 

-12* 

IRC  M16 

LRC  M60 

Mean  %  correct,  K 

72.79 

61.80 

Mean  %  60, 

H-0 

85.84 

68.35 

A  final  test  that  shows  substantial  correlation  with  both  frequency  and 
recency  of  practice  Is  "Use  automated  CEO!"  (Communications  Electronics 
Operating  Instructions).  It  was  given  In  only  two  HOSs,  and  Is  similar  to 


104 


grid  coordinates  In  that  It  results  In  an  objectively  observable  result. 
The  correlations  were  as  shown  below: 


Tank 

Crewman 

MP 

CEOI  K-test  &  freq. 

38* 

24* 

CEOI  K-test  &  recency 

-29* 

-27* 

CEOI  H-0  test  &  freq. 

31* 

Not  given 

CEOI  H-0  test  &  recency 

-27* 

Not  given 

This  test  requires  memory  of  procedures  for  looking  up  information  in  a 
table  and  reporting  call  signs,  radio  frequencies,  and  authentication 
data.  A  number  of  soldiers  taking  the  hands-on  test  reported  on  how 
easily  the  procedures  for  reading  the  table  are  forgotten. 

Conclusions 

The  ratings  on  frequency  and  recency  of  practice  of  tasks  tested  in 
Project  A  show  very  low  correlations  with  test  performance.  There  are, 
however,  some  tasks  that  show  a  significant  relationship,  and  in  a  consis¬ 
tent  enough  manner  to  suggest  that  we  are  not  dealing  with  chance  results. 

Tasks  that  are  related  to  practice  seem  to  be  those  that  produce  objec¬ 
tively  observable  results,  that  are  relatively  complex,  and  related  to  the 
MOS  specific  parts  of  the  job  rather  than  to  the  common  soldier  tasks. 


Reference 

Campbell,  J.P.  1986,  August.  Project  A:  When  the  textbook  goes  operational. 
Paper  presented  at  the  94th  Annual  Convention  of  the  American  Psychological 
Association.  Washington,  D.C. 


105 


ro  r*  m 
tn  m  m 


«*■  in  V 


SOME  CONDITIONS  AFFECTING  ASSESSMENT  OF  JOB  REQUIREMENTS 


Elizabeth  P.  Smith 
Paul  G.  Rossmelssi 

U.S.  Army  Research  Institute 


Presented  on  Session,  "Improving  Training  Performance" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


107 


Scat  Conditions  Affsctinf  AasosMsnt  of  Job  IsquirMCDts 

IliMbctb  P.  Snitb^ 

O.S.  Any  Issosrch  Instituu 
for  tbs  Bstasviorsl  and  Social  Sciancas 

Fsul  C.  losaaeissl^ 
lay  Systsas,  Inc. 


As  an  adjunct  to  tbs  Any  Basaarcb  Instituta's  Projact  A  to  iaprova  tbs 
aalaetion  and  classification  process,  rasaarcb  ws  initiated  to  develop  and 
test  a  rating  scale  aetbod  to  assess  (Eaton,  at.  al.,  1984)  buaan  attributes 
(e.g.,  abilities,  interests,  etc.)  that  are  needed  for  success  in  a  particu¬ 
lar  Military  Occupational  Specialty  <MOS)  (Saith,  1985).  The  work  followed 
froa  the  ability  taxonoay  and  rating  scale  work  by  Fleisbaan  and  bis  associ¬ 
ates  (see  Fleisbaan  &  Quaintance,  1984).  Ifitbin  Project  A,  a  taxonoay  of 
buaan  attributes  that  affect  pcrfonance  was  developed  froa  expert  judgaents 
of  validity  (Ving,  Peterson,  A  Boffasn,  1984).  The  taxonoay  included  21 
clusters  of  cognitive/pcrceptual,  psyeboaotor,  and  noncognitive  (teaperaaent 
and  interests)  variables.  Saith  (1985)  construcud  a  set  of  scales  corre¬ 
sponding  to  20  of  these  attributes  plus  physical  strength  and  staaina.  This 
set  of  scales,  the  Attribute  Assessaent  Scale  (AAS),  which  was  designed  to 
use  work  supervisors  as  Subject  Hatter  fieperts  (SHEs),  contains  priaarily 
Aray-specifie  behavioral  anchors.  Several  probleas  were  uncovered  during 
prellainary  tests  of  the  instruaent  with  two  different  saaples  (Saith  & 
lossaeissl,  in  process).  The  research  which  is  presented  here  atteapted  to 
address  those  issues.  As  with  tbe  earlier  research,  the  goal  was  to  deaon- 
strscc  that  tbe  scales  can  produce  reliable,  differential  profiles  of  at¬ 
tribute  requireaents  that  discriainate  across  HOS.  These  profiles  then 
could  be  aatebed  to  aeasurcs  of  an  individual's  attributes  for  selection  and 
classification  purposes. 

In  the  first  test  of  the  AAS  (Saith,  1985),  senior  noncoaaissloned  offi¬ 
cers  (NCOs)  froa  two  HOS  provided  ratings  of  the  requireaents  for  entry 
level  work  in  their  own  HOS  for  three  pcrforaance  levels  (15tb,  50th,  and 
85th  percentiles).  IVo  types  of  Intracless  Correlation  Coefficients  (ICCs) 
were  calculated  over  all  attributes.  The  first  provides  a  point  csti- 
aate  of  interrster  reliability  or  the  reliability^ of  a  single  rater.  The 
second  (tj^)  indicates  the  reliability  of  the  aean  rating.  These  coeffi¬ 
cients  were  extreaely  weak.  There  was  very  little  interrater  agreenent  and 
at  least  30  raters  were  needed  to  obtain  aoderately  reliable  aeans~a  nuaber 
higher  then  would  be  practical  In  operational  use.  An  AMOVA  indicated  that 
attribute  profiles  for  the  two  HOS  were  not  significantly  different. 

There  appeared  to  be  three  aajor  probleas  related  to  the  Instruaent  and 
the  research.  First,  the  inclusion  of  three  perfomance  levels  any  have  had 
a  strong,  negative  iapact  on  the  results.  Tbs  deaands  of  tbe  task  appeared 

^Ths  views  expressed  la  this  paper  are  chose  of  the  authors  and  do  not 
necessarily  reflect  the  views  of  the  O.S.  Amy  Research  Institute  or  tbe 
tepartneat  of  the  Amy. 

^Affiliated  with  U.S.  Amy  Research  Institute  at  the  tine  this  research 
took  place. 


108 


to  lapoM  m  unique  kind  ot  ruotrictlon  in  the  rung*  of  poosible  ratings, 
plus  it  took  eonsidorabla  of fort.  Second,  Um  nultiple  levels  added  note 
confusion  to  a  perforaance  criterion  wblch  uas  already  very  broad  ~  all 
work  within  all  duty  positions  —  allowing  for  conaiderable  variance.  The 
third  problen  centered  on  the  scale  anchors.  This  included  SHEs'  frustra> 
tion  with  their  content  and/or  difficulty  in  using  then  as  reference  points 
for  nvaluatlng  duties  within  their  HOS. 

The  second  test  of  the  AAS  (Snith  &  lossaeissl,  in  process)  considered 
two  of  these  issues.  SHEs  were  a  aaall  nuaber  of  officers  and  NCOs  fron 
three  MOS.  Ve  provided  a  written  job  description  fron  Amy  kegulation 
611-201,  and  SHEs  gave  a  single  rating  of  the  level  of  each  attribute  re¬ 
quired  for  "average*'  perfomance  of  entry  level  work  in  their  own  MOS.  An 
inportant  aspect  of  the  research  was  a  post-rating  discussion  period  during 
which  SHEs  provided  inf oma tion  about  problens  that  they  had  in  conpletlng 
the  task,  specific  issues  related  to  inte^re cation  of  “average”  perfom- 
nnee,  confidence  in  their  responses,  and  ways  to  Inprove  the  procedures. 

VI th  the  exception  of  one  HOS  for  which  procedural  problens  were  noted, 
the  results  were  pronlslng.  Overall,  the  nagnltudes  of  the  ICCs  were  better 
than  those  obtained  in  the  original  research.  Reliabilities  of  nean  rating 
(^)  equal  to  .73  and  .84  with  only  4  and  9  raters  respectively  were  encour- 
a^ng.  ANOVA  results  indicated  no  significant  differences  in  profiles 
across  HOS,  but  given  the  snail  sanple  sizes  this  was  not  surprising.  Our 
post-rating  discussions  indicated  that  use  of  the  criterion  “average**  per¬ 
fomance  nay  have  reduced  HOS  differences  as  well.  Problens  with  this  ter- 
nlnology  included  sone  tendency  a)  to  describe  the  average  soldier  ether 
than,  e.g.,  the  average  Adnlnistrative  Specialist,  b)  to  confuse  average 
perfomance  with  average  level  of  requirenents,  and  c)  to  view  average  per¬ 
fomance  as  actually  substandard.  The  discussions  also  confimed  there  were 
still  problens  related  to  the  anchors  and  the  anbiguity/enomity  of  the 
“whole  job”  criterion. 

Given  these  outcones,  we  decided  to  test  the  rating  scales  again  under 
different  conditions.  In  this  research  we  exanlned  ratings  of  attribute  re¬ 
quirenents  for  the  whole  MOS  versus  m tings  of  inportant,  mpresentatlve 
conponent  tasks  using  two  sets  of  scales  with  different  enchors. 

HETBOD 

Sanple 

One  hundred  fifty-nine  NCOs  fron  three  HOS  (Cennon  Crewnan:  13B,  Light 
Vbeel  Vehicle  Mechanic:  63B,  and  Single  Channel  Radio  Operator:  31C)  at  two 
posts  served  as  SHEs. 

Procedure 

Vi  thin  MOS  and  posts,  SHEs  were  assigned  in  blocks  of  12  or  less  to  one 
of  4  condition  groups.  Group  I  rated  the  job  as  a  whole,  using  the  origi¬ 
nal,  behaviomlly-anchorad  AAS.  Group  II  mted  the  job  aa  a  whole,  using 
acales  with  generic  anchors  (l«very  low,  A^oderata,  7«very  high).  Groups 
111  and  IV  rated  the  attribute  requirenents  for  IS  conponent  usks  of  their 
MOS.  The  tasks  were  those  used  in  the  hands-on  tasting  portion  of  Project 
A.  Group  III  used  the  behevlorally-anchored  scales;  Group  IV,  the 
generlcally-anchored  ones.  SHEs  estlnated  the  levels  of  the  22  attributes 


109 


which  arc  required  for  "eucceeeful  perfoxaence'*  of  Skill  Level  1  work  for 
their  own  HOS.  SIlEc  In  the  prevloue  reseercb  favored  chi*  choice  of  per- 
focaence  criterion.  In  addition  to  the  written  Instructions,  we  provided 
SHE*  with  brief  training  in  how  to  use  the  eceles  to  derive  ratings. 

Analyses 

10  detaralne  reliability,  we  celculeted  ICCs  (r^  end  froa  Attribute 
Z  later  AMOVAs  by  group  for  each  HOS.  To  eoapare  reliabilities  based  on 
aaae  alsed  groups,  we  estlaated  rellablll^  of  aean  ratings  based  on  6 
caters  (r^)  using  the  Speaman~Brown  focaula.  tte  perfomed  an  HOS  X  Attrib¬ 
utes  X  Anchor  (Generic  vs.  Behavioral)  X  Criterion  (Whole  Job  vs.  Tssks) 
univariate  repea ted-aea sure s  AMOVA  to  exaalne  differences  In  profiles  aaong 
HOS  and  any  effects  due  to  anchor  or  criterion  conditions.  The  single,  high¬ 
est  rating  assigned  to  any  task  within  each  attribute  was  used  In  the  ANOVA. 

lesults 

the  ICCs  (r.,  ^  conditions  by  HOS  are  given  In  Isble 

1.  Overall,  esTiaates  »  Interreter  agreeaent  are  low.  The  best  £2*  are  for 
Radio  Operators  across  all  4  conditions,  yet  there  still  are  large  be- 
tween-subjects  variances  for  all  HOS.  Across  HOS,  no  particular  condition 
yielded  higher  jr^s  or  r^s  then  another. 

Table  1 

Reliability  estlaates  for  a  single  rater,  aean  of  k  raters,  and 

aean  of  six  raters  of  three  HOS  by  experimental  conditions 


HOS 

Anchor 

Type 

Criterion 

k 

£1 

ik 

£6 

Cannon 

Behavioral 

Task 

19 

.06 

.6i 

.34 

Crewaeo 

Job 

12 

.22 

.77 

.63 

Generic 

Task 

25 

.07 

.67 

.31 

Job 

12 

.04 

.36 

.20 

Radio 

Behavioral 

Task 

9 

.28 

.77 

.70 

Operator 

Job 

12 

.22 

.77 

.63 

Generic 

Task 

12 

.17 

.71 

.55 

Job 

17 

.19 

.80 

.58 

Hechanlc 

Behavioral 

Task 

22 

.11 

.74 

.43 

Job 

7 

.12 

.48 

.45 

Ganerlc 

Task 

6 

.07 

.32 

.32 

Job 

6 

.21 

.61 

.61 

The  HOS  X  Attrlbuta  Z  Anchor  X  Criterion  AMOVA  indicated  there  are  sig¬ 
nificant  dlffarencas  In  attribute  profiles  across  HOS  and  that  these  differ¬ 
ences  were  effected  by  the  experlaental  conditions.  Although  the  4-way 
incaractlon  Is  not  significant,  two  3-way  interactions  (Attribute  X  HOS  X 
Anchor  and  Attribute  X  HOS  X  Criterion)  and  all  2-way  Interactions  Involving 
attribute  are  significant  with  a  Gelsser-Greenbous*  £<.05.  That  is,  aean 


no 


•  fuBctlpn  of  th*  ^pc  ot  aschort  oz  tta«  crltarion.  Collapsittg  ov«r  Qrpc  of 
czitorleot  fOMZieally-oBcbortd  oealos  jioldod  htglwr  ■ms  rstlsgo  for  all 
•ttrlbucoo.  Ob  cbo  otter  tend*  tte  offoctt  of  eritorioa  coBditlos  (Job  «o. 
tBoko)  woro  OopoodoBt  on  tte  tjpo  of  attrlboto.  For  tte  aost  psrti  Bcrooo 
MOS«  wo  fosod  hlgter  ■sons  for  ovoluotioao  of  tte  wbolt  job  for  tte  eogni- 
tiM/porcoptsol  ottrlbutos,  ooac  of  tte  pojreboaotor  ottrlbutoo,  and  of  tte 
■oaeogaltlwa  attrlbutsa«  nallatle  and  inoattigatlva  inurasta.  Tte  oppo- 
aita  was  trna  for  pbpalcal  atrangtb»  otaalna,  and  tte  otter  noncognltioa 
(taaparanant)  attrlbutaa.  Figuros  l(a>c)  graphically  depict  tte  throe  2^ay 
iataractlOBS. 


AttrlkatMi 


m — ttitt — 


Ift  •  fwlal 

a  -  aMMjr 

a  •  ■■tiwiM 

ar  •  Ifeator  iMUftr 

ae  ■  BMtMlMl  flNfMtaMlM 

V  •  mmwum  miniiiM 

C  ■  aMan 

t  •  flMMlUMtlM 

IB*  •  awMtwJ  Ifm*  •  Maancy 


K  ■  rtyMaal  BtMaatt 
S  m  tmmtm 


aH«wlty  • 
iMMlWM 


C 


yn-Oinatt 

ai  •  tacU}  iBtiMCUM 
at  ■  air«M  talai— r» 


■a  •  fiMMtMUVMMM 

at  •  oaik  asiMtiUM 
aa/i  >  a«if>>«w«  !—«»■»» 

M  ■  AtUaUc  MUtty 
U  •  aHttaUC  litMMM 

n  •  iMMUaiUva  tewiMM 


Figure  1.  CoBperlgon  of  profllea  of  attribute  oMna  by  a  HOS,  b  Criterion 
(Job  va.  Teak),  and  e  Anchor  (Behavioral  va.  Oenerlc). 


Ill 


DI5C0S5I0M 


4s  with  taitisl  tssts  of  tht  4AS,  the  iaterrater  agreeaeat  fooad  here  is 
relatively  low.  For  aost  purposest  however,  we  are  aore  lateresced  la  the 
reliability  of  the  aeaa  ratiags  which  were  aoderace  for  aost  of  the  coodi-> 
tioas.  Use  of  geoerlcallyaachored  scales  did  aot  taprovc  ths  reliabilities 
as  our  previous  research  had  suggested,  but  the  bahaviorally  aacbored  scales 
were  ao  aore  reliable  thaa  the  geaaric  aachors.  la  effect,  however,  usiog 
behavioral  aachors  taaded  to  lower  aeaa  ratiags,  perhaps  by  reduciag  a  "aore 
aeaas  better"  taadeacy  toward  iaflatlag  estiaates  of  requireaeats  for  good 
parforaaaca.  These  fiadiags  suggest  that  ia  slailar  situatioas  the  iapact 
of  usiog  behavioral  based  aachors  aay  aot  aerit  their  iocreased  devalopaea*' 
tal  effort  aad  cost. 

Slailsrly,  to  the  degree  it  was  tested  here,  hsviag  SHEs  rate  coaponents 
of  the  job  did  aot  locrease  agreeaeat  aaoog  raters  either,  la  our  aaalyses 
we  used  ooly  oae  of  the  15  rstlogs  aade  by  SMEs  ia  the  task  ratiog  condi¬ 
tions.  Perhaps  we  would  find  better  interrater  reliability  if  we  focused  on 
each  task  indlvlduslly.  The  choice  of  criterion  did  affect  aagnitude  of 
ratings,  but  not  in  the  sene  way  for  all  attributes.  Differences  in  aesns, 
as  well  as  lack  of  agreeaent  aaong  raters,  asy  well  have  been  a  function  of 
the  coaprahenalveness  or  representativaness  of  the  tasks.  Soae  SMEs  argued 
that  the  specific  tasks  we  used  required  little  or  none  of  soae  attributes 
(especially  teaperaaent  attributes),  but  that  these  attributes  are  required 
for  other  aspects  of  the  job.  A  few  SMEs  indicated  they  gave  high  ratings 
on  the  tasks  for  this  resson,  thus  ignoring  our  instructions  to  rste  only 
Cha  15  tasks  provided. 

Although  we  were  unable  to  increase  reliability  by  altering  the  condi¬ 
tions  of  the  adainistrstion  of  the  AAS,  the  data  were  sufficiently  relieble 
to  yield  aeanlngful  results.  The  key  interaction  of  MOS  and  attribute  was 
statistically  significant:  We  did  attain  significantly  different  require- 
aeots  profiles  across  MOS  aean  ratings.  Also  significant  were  the  coaperi- 
sons  investigating  the  effects  of  anchor  type  and  level  of  analysis  (job 
versus  cask).  In  other  words,  while  the  reliabilities  were  low,  they  were 
sufficient  to  provide  valuable  inforaetion.  Given  this  and  the  other  find¬ 
ings,  the  AAS,  while  not  producing  results  which  advocate  its  use  for  selec¬ 
tion  and  classification  purposes,  still  aay  have  soae  potential.  For 
example,  it  aay  be  useful  for  identification  of  the  two-three  top 
high-driver  attributes  for  ao  MOS,  ot  for  evaluation  of  a  narrowly  defined 
task,  such  as  a  particular  kind  of  aission.  Our  debriefings  with  SMEs  lead 
us  to  believe  thet  any  future  use  of  ths  AAS  or  slailar  kinds  of  Instruaents 
really  should  involve  an  inuosive  tsaiaing  session.  SMEs  should  be  given 
thorough  explanations,  with  axaaplas,  of  what  the  attributes  entail  and 
faslped  to  see  how  they  relate  to  various  aspects  of  the  job. 


lEFEIEMCES 

Eaton,  M.  K.,  Goar,  M.  B. ,  Baris,  J.  B,,  A  Zook,  L.  M.  (October,  1984). 
toproving  the  selection,  classiflcaUOT.  and  utilisation  of  Aray  en¬ 
listed  personnel;  Annual  Report,  ii)84  Hscal  Year.  (Technical  Eeport 
Mo.  660).  Alexandria,  VA:  U.S.  Aray  Basearch  Institute  for  the  Behsv- 
iorsl  and  Social  Sciences. 


112 


Reishman,  E.  A..  &  Quaintance,  M.  K.  (91984).  Taxonomies  of  human  performance. 
Orlando.  FL:  Academic  Press  Inc. 

Smith,  E.  P.  (1985,  November).  Developing  new  attribute  requirements  scales  for 
military  jobs.  Proceedings  of  the  27th  Annual  Conference  of  the  Military  Testing 
Association.  San  Diego.  CA. 

Smith,  E.  P.  &  Rossmeissl.  P.G.  (1984).  Attribute  assessment:  Initial  test  of  scales  for 
determining  human  requirements  of  militafy  jobs.  Technical  Report.  U.S.  Army 
Research  Institute  for  the  Behavioral  Sciences,  Alexandria,  VA. 

Wing,  H.,  Peterson,  N.  G.,  &  Hoffman,  R.  G.  (1984,  August).  Expert  judoments  of 
predictor-criterion  validity  relationships.  Paper  presented  at  the  92nd  Annual 
Convention  of  the  American  Psychological  Association,  Toronto,  Canada. 


113 


SHORT  VERSUS  LONG  TERM  TENURE  AS  A  CRITERION  FOR 
VALIDATING  BIODATA 


Elizabeth  P.  Smith 
Clinton  B.  Malker 

U.S.  Army  Research  Institute 


Presented  on  Session,  "Validating  Selection  Criteria" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


115 


Short  Vorsus  Long  Ton  Tonurt  os  o  Crltorlon  for  Validating  Biodata 

Eliubath  P.  Saitb  and  Qinton  B.  Ualkar^ 

0.  S.  Any  Raaaarcb  Xnatituta  for  tha  Bahavioral  and  Social  Sciancaa 


Thla  raaaarcb  taata  tba  bypotbaala  that  tba  traditimial  critarion  for 
validating  biodata  in  ailitary  naaarch*  via.  attrition  daring  tba  firat  aix 
■ontba  of  aanrica  varaus  auceaaaful  eoaplation  of  that  pariod*  baa  producad 
laaa  affactiva  acoring  kaya  and  lowar  validitlaa  than  a  longar  critarion  pe¬ 
riod  would.  Ihia  bypotbeeia  la  baaad  on  two  flndlnga.  Firat,  at  laaat  half 
of  attritlona  in  pravioua  raaaarcb  bava  occurad  after  tba  firat  aix  non tba  of 
aarvice  (Goodatadt  6  Tadlin,  1980;  Hicka,  1981).  Second,  only  half  aa  aany 
Itaas  in  a  60-1  tea  biodata  Inatxuaent  were  kayabla  at  tba  aix-«ontb  point  aa 
ware  at  tanuraa  of  one  to  three  yaara  in  data  fron  5,941  appllcanta  to  the 
Any  in  FY1981  and  1982  (Ualkar,  1985).  If  tbaaa  findinga  are  generally 
true,  than  keying  on  tanuraa  longar  than  aix  non  tha  will  nova  aany  firat  tan 
attritlona  fron  tba  auceaaaful  criterion  group  to  tba  unauccaaaful  one,  where 
they  belong,  and  will  produce  a  larger  pool  of  kayabla  itaaia.  Both  of  those 
results  should  inprova  validity.  In  tha  present  paper,  itaas  froa  the  Any's 
Military  Applicant  Profile  (MAP)  are  keyed  on  statue  at  tha  6-aonth  and  then 
at  the  39  -  45  aonth  point,  depending  on  date  of  entry,  and  the  validities 
are  eoapared  for  those  two  criterion  periods. 

Method 


Inatruaent 


A  240-questioo  research  version  of  the  MAP,  which  is  a  aultiple  choice 
biodau  questionnaire,  provided  the  itaas.  TWo  fons  of  the  inatruaent,  with 
different  sequences  of  the  itaas,  were  used.  In  content,  the  questions  deal 
with  salf-astaaa,  aotivas  for  enlisting,  experiences  in  school,  work  experi¬ 
ence,  expectations  of  allltary  life,  social  ittblta,  expcrleoces  in  the  faa- 
ily,  athletic  activity,  end  aiscellaneous  other  experiences. 

Saaple 

The  saaple  was  9,416  receptees  at  all  seven  Amy  Reception  Stations  who 
took  the  inatruaent  in  Jsnuary-June  1982.  This  nuaber  included  7,653  aales, 
of  which  6,403  were  high  school  graduates  and  1,250  were  non-graduates  or  CEO 
holders.  Also  in  the  saaple,  but  exaained  only  for  cross-validity,  were 
1,763  feaales,  all  high  school  graduates. 

Criteria 


All  cases  were  divided  into  "stayers*'  and  "leavers"  as  follows.  Stayers 
were  either  on  Active  Duty  at  the  and  of  the  period  being  exaained  or  had 

^The  opinions  in  this  paper  are  the  authors'  and  do  not  necessarily  reflect 
views  or  policy  of  the  Aray  Research  Institute  or  tbs  Departaent  of  the  Amy. 
Richardson,  Bellows,  Henry,  and  Coapeny,  Inc.,  under  contract  to  Amy  Re¬ 
search  Institute,  developed  the  itaas  for  this  work  and  collected  the  raw 
predictor  data  and  six-aonth  criterion  data.  Joseph  Stephenson  created  the 
dataset  with  the  longer  tenures.  He  gratefully  acknowledge  his  support. 


116 


been  discharged  for  positive  reesoas  (e.g.,  end  of  enlistaeot, transfer  into 
an  officer  candidate  progran)  or  "no  fault”  raaaons  (e.g.  Mdical,  hardship). 
Lsavers  were  cases  who  hsd  been  discharged  for  any  causes  ocher  chan  those 
above.  Thaae  latter  cases  were  presua«d  to  have  been  discharged  early  for 
any  of  various  failures  to  adapt  to  Amy  life.  For  coaparing  abort  and  longer 
tenures  as  criuria,  the  atatus  of  the  eases  was  exanlned  first  at  the  end  of 
tha  initial  aix  aontha  of  aervice  ana  then  as  of  1  October  1985,  which  was 
fron  39  to  45  aonths  after  accession.  Leavers  afur  the  first  six  non  tbs 
were  in  the  successful  group  for  the  first  analysis  and  in  the  unsuccessful 
group  for  the  longer  tenure. 

Procedure 


bipirically  derived  scoring  keys  were  developed  on  a  602  saaple  of  all  of 
the  aales.  To  select  iteas  for  keying,  we  ran  itea>level  chi  square  tests  on 
the  frequencies  with  which  the  separate  response  choices  were  picked  by  the 
criterion  groups  (stsy  vs.  leave).  Iteas  giving  p  <  .05  were  retained  for 
keying.  These  iteas  were  keyed  using  a  borixontal  percentage  aethod  (Cascio, 
1982;  Riegelhaupt  &  Bonezar,  1985),  weighted  for  differences  in  sizes  of  the 
criterion  groups.  These  weighted  percentages  of  stayers  were  then  rounded 
and  converted  to  single  digit  weights  ranging  fron  -1  to  4-3.  The  conversion 
rule  was  as  follows:  up  to  242  stayers  •  *1;  25  to  342  ■  0;  35  to  442  «  1; 

45  to  542  ■  2;  >542  •  3.  Under  this  rule  for  assigning  weights,  sons  iteas 
were  weighted  nore  heavily  than  others  by  having  a  wider  range  of  possible 
scores. 

Item  scores  for  each  case  were  sunned  and  tested  for  differences  between 
criterion  groups.  Then,  point  biserials  were  calculated  on  the  relation  be> 
tween  total  scores  and  the  dlchotoaous  stayleave  criterion.  Afur  finding 
validities  on  the  dcvelopaent  saaple,  we  coapuud  validities  on  the  independ- 
ant  holdout  saaple  of  all  aales,  on  two  randoa  sanples  of  the  feaales  602  and 
402),  and  on  sinilar  splits  of  the  two  aale  groups  (graduaus  and  on-gradu- 
aus)  which  were  subsets  of  the  larger  developaent  and  holdout  groups.  These 
procedures  were  followed  first  for  the  short  criurion  period  aaxiaua  service 
of  six  aonths)  and  then,  on  the  saae  cases,  for  the  longer  criurion  period. 

As  a  check  on  whether  the  sane  iteas  would  be  effective  for  predicting 
success  over  both  short  and  long  criurion  periods,  we  divided  iuns  into 
those  which  were  unique  to  esch  key  (i.e.,  two  sets)  and  those  thst  were  coa- 
aon  to  both  keys.  Toul  keyed  scores  for  each  set  were  then  validated.  Ve 
also  ran  a  second  kind  of  cross<-validatioo  to  find  bow  well  each  key  works  in 
predicting  the  the  length  of  service  on  which  it  was  not  developed.  That  is, 
we  calculaud  validities  for  the  long-unure  key  on  the  short  criurion  pe¬ 
riod  and  for  the  short-unure  key  on  the  long  criterion  period. 


Bssulu 

Table  1  shows  how  aany  iteas  were  keyable  at  both  unures  and  how  aany 
were  uniquely  keyable  at  only  one.  Validities  for  these  sets  of  iuas  end  for 
the  tottl  set  thst  was  keyable  for  each  condition  (unique  plus  coaaon)  are 


117 


Dibit  1 

Vtlidititi  for  tlti  of  itti  of  itw  that  wrt  ktyablc  at  only  the  thort 
tenure ,  only  the  long  tenure,  and  at  both 


Tenure  at  Vblcb 

Itena  Vera  leyed 

Short 

Ztene 

(n) 

Long 

Criterion 

Total 

(145) 

Unique 

(23) 

Connon 

(122) 

Total 

(181) 

Unique 

(59) 

Connon 

(122) 

Short 

Developnent  aanple 

.25 

.17 

.24 

.18 

.10 

.21 

Holdout  aanple 

.19 

.14 

.19 

.18 

.11 

.19 

Long 

Developnent  eanple 

.22 

.09 

.23 

.31 

.27 

.30 

Holdout  eanple 

.18 

.11 

.18 

.26 

.25 

.24 

Note.  The  critical  value  for  a  difference  between  two  Independent  correla¬ 
tion  coeff Icienti,  one  for  the  developnent-  aanple  (n  ■  4,594)  end  one  for  the 
holdout  eanple  (n  •  3,059),  Is  .046  <£  <  .05,  two-talled). 


Table  2 

Vallditlei  by  aanple  and  by  tenures  for  keying  end  for  validatlnt;  ratet 
of  success 


Tenure  on 

Which  the 

Itens  Were  Keyed 

Short  (145  Itens) 

Long 

(181  Itens) 

Criterion 

length; 

Z 

Criterion  length: 

Z 

Group 

N 

Short 

Long 

Stay 

Short 

Long 

Suy 

All  Hales 

Developnent 

4,594 

.25 

.22 

.87 

.18 

.31 

.75 

Holdout 

3,059 

.19 

.18 

.86 

.18 

.26 

.75 

Fenalea 

Senple  1 

1,077 

.14 

.11 

.80 

.14 

.15 

.77 

Sanple  2  686 

Mon-greduate  nales 

.19 

.15 

.79 

.16 

.16 

.76 

Sanple  1 

743 

.20 

.12 

.79 

.13 

.19 

.56 

Sanple  2 

507 

.20 

.11 

.80 

.12 

.19 

.58 

Graduate  nales 

Senple  1 

3,888 

.22 

.20 

.88 

.17 

.27 

.79 

Sanple  2 

2,515 

.21 

.20 

.88 

.16 

.25 

.79 

118 


alto  given.  Validities  and  cross-validities  at  both  the  tenure  for  keying 
and  the  other  tenure  are  given  in  Tables  1  and  2.  Table  2  gives  validities 
and  success  races  for  various  groups  of  cases:  all  aales,  feaales,  graduau 
■alest  and  non-graduate  nales. 

Tsble  3 

Descriptive  statistics  on  developnent  and  holdout  aanples  as  a  function  of 
the  tenure  for  keyina  itens  and  the  criterion  for  validating  total  acores 


Tenure  on  Vhlch  the  Itens  Here  Keyed 


Short  Long 


Criterion 

M 

n 

ad 

t* 

H 

a 

ad 

t« 

Short 

Developnent  saaple 

Stayers 

3,993 

252.5 

15.6 

14.01 

3,993 

257.3 

20.3 

11.27 

Leavers 

601 

240.2 

20.6 

601 

245.9 

23.6 

Holdout  saaple 

Stayers 

2,629 

252.3 

15.5 

9.43 

2,629 

256.9 

19.9 

9.77 

Leavers 

430 

243.1 

19.2 

430 

246.7 

21.3 

Long 

Developnent  saaple 

Suyers 

3,457 

253.0 

15.5 

14.02 

3,457 

259.6 

19.4 

20.64 

Leavers 

1,137 

244.3 

16.9 

1,137 

244.5 

21.9 

Holdout  saaple 

Stayers 

2,306 

252.7 

15.4 

9.55 

2,306 

256.5 

19.5 

14.88 

Leavers 

753 

245.7 

18.1 

753 

246.2 

20.1 

*£  -  .0001 

Tsble  3  gives  aean  toul  scores  and  standard  deviations  for  stayers  and 
leavers  in  the  developnent  and  cross-validation  aanples  and  results  of 
t-tasts  on  their  neans.  These  results  are  given  for  the  cases  where  itens 
were  keyed  and  validated  on  the  sane  and  on  different  tine  periods. 


Disenaaion 

In  five  different  respects,  these  dau  support  the  hypothesis  that 
tenures  longer  than  the  traditional  six  nonths  are  better  for  keying  and 
validating  blodau.  First,  a  full  46Z  of  attrition  In  this  aanpla  occurred 
after  the  alx-«onth  point.  Thus,  a  key  developed  at  chat  point  Is  degraded 
by  the  presence  of  alaost  half  of  the  leavers  In  the  successful  criterion 
group.  Second,  while  over  half  of  the  valid  leans  are  keyable  at  both  the 
short  and  long  tenures,  nore  than  twice  as  nany  are  uniquely  keyable  at  the 
longer  one  (59  vs  23).  Thus  a  longer  Inatruaent  results  fron  exunding  the 
period  for  keying. 


119 


Third*  validity  and  croaa-validity  ara  higbar  whan  itaas  ara  kayad  and 
validatad  on  tha  Itmgar  pariod.  It  ia  trua  that  congruance  la  tha  tanuraa 
for  haying  and  validating  (i.a.,  aitbar  Short  hay  with  Short  critarion  or 
Long  hay  with  Long  Critarion)  produca  tha  hlghast  aata  of  vallditiat  bara; 
hut  atill  tha  original  validity  la  tha  Short^Short  condition  (.25)  doaa  not 
azeaad  tha  eroas<-validlty  in  tha  Long-Long  condition  (.26).  Sinilarly,  the 
Long  hay  for  tha  conaon  itana  haa  aa  high  a  eroaa-validity  for  tha  Short 
criterion  aa  doaa  tha  Short  hay  for  any  aat  of  itana,  whila  it  haa  a  higher 
validity  at  tha  Long  criterion  than  any  aat  of  itana  with  tiia  Short  hey  doaa. 

fourth,  ahrinhaga  of  croaa-validitlaa  ia  laaa  for  itan  aata  that  are 
kayad  at  tha  long  tenure.  In  Table  1  tha  nadlan  ahrinhaga  for  Short  kaya  ia 
.045  whila  for  Long  kaya  it  ia  .02.  finally,  the  largaat  naan  diffarencas  in 
total  acora,  both  in  tama  of  keyed  pointa  and  in  t-valua  ara  for  keying  and 
validating  at  the  longer  tenure  (Table  3). 

The  dau  in  Table  1  aupport  one  other  optiniatlc  concluaion.  Although 
tha  aata  of  unique  itana  have  fairly  low  validitiaa  for  tha  critarion  on 
which  they  ware  not  kayabla,  tha  59  itana  which  ware  aignif leant  at  only  the 
long  tenure  have  a  good  validity  and  croaa-validlty  for  the  longer  criterion. 
Anong  tha  highaat  validitiaa  in  that  table  ara  thoaa  that  cone  froa  thia  aat 
of  about  one- third  of  tha  itana  that  ara  uaaful  over  that  longer  period.  Thia 
finding  inpliaa  that  there  nay  be  enough  valid  itana  to  produca  aavaral  teat 
foxna  of  aatiafactory  validity.  Anong  other  thinga,  tha  laaua  of  bow  to  as- 
algn  itaas  to  foms  needs  to  be  addressed. 

A  second  topic  for  further  research  ia  that  of  possible  diffarencas  in 
early  and  lata  leavers.  If  found,  any  such  differences  night  help  to  explain 
differences  between  leavers  and  suyars.  A  conparlson  of  tha  content  of  the 
two  unique  sets  of  iteas  nay  yield  soae  hypotheses  on  this  issue. 

Although  these  results  conflra  the  statistical  superiority  of  keying  and 
validating  on  longer  tenures,  that  practice  has  a  cost:  that  of  delaying  in- 
plenentatlon  of  the  inatruaent  while  the  criterion  natures.  One  question  for 
further  research  is  bow  to  balance  the  benefits  of  high  validity  with  those 
of  early  iapleaentabllity  so  as  to  aaxinise  the  net  benefit. 

The  results  for  feaales  and  for  non-graduate  aales  are  not  as  positive  as 
for  nen  overall.  Vhether  a  good  unisex  scoring  key  could  be  developed  re- 
aains  to  be  seen.  Froa  the  the  percents  of  stayers  in  Table  2,  attrition 
saens  to  be  a  sonewhat  different  process  in  aales  and  faaalea:  unlike  aales' 
attrition,  alaoat  all  of  feaales'  occurs  ia  the  first  six  aontha. 

Even  though  the  saaples  of  feaales  and  non-graduate  aales  are  large  in 
absolute  nuabara,  they  nay  not  be  large  enough  in  these  data  to  produce  sta¬ 
ble  perfomaaca  in  a  biodata  inatruaent.  TWo  aspects  of  the  ailltary  re¬ 
search  aattiag  nake  results  froa  validations  of  aon-cognitive  predictors 
ralativaly  unstable.  First,  attrition  is  aaaaged,  and  policy  on  acceptable 
levels  thereof  varies  over  the  years.  Thus  the  criterion  is  driven  by  at 
least  one  force  that  is  not  tightly  connectad  with  the  characteristics  of  the 
axaainees.  Second,  the  characteristics  of  the  applicant  and  accession  pools 
also  change  over  the  years.  For  exaaple,  a  decade  ago  about  half  of  acces¬ 
sions  were  non-graduate  aales;  now  the  rate  is  around  102.  These  facts  nake 


120 


It  iaportanc  to  use  large,  stable  saaples  for  developing  keys. 


Previous  stteapts  to  evaluate  the  validity  of  HAP  in  the  operational  set¬ 
ting  have  found  validities  to  be  nuch  lower  than  in  the  research  setting 
(Valkar,  1984).  Unlike  the  present  research,  past  work  in  developing  scoring 
keys  has  not  cross-validaud  then.  The  robust  cross-validities  for  the 
long-long  condition  here  give  reason  to  believe  that  the  keys  developed  here 
would  retain  a  good  level  of  validity  if  put  into  operation.  Even  with  that 
assunption,  further  research  on  the  cates  of  accurate  and  inaccurate  selec¬ 
tion  decisions  to  be  expected  should  be  carried  out  to  see  whether  the  in- 
struaent  is  likely  to  be  cost-effective. 

Beferences 

Gsscio,  U.  F.  (1982).  Applied  psychology  in  personnel  aanagenent.  Reston, 
VA:  leston. 

Goods tad t,  B.  E.  &  Yedlln,  N.  C.  (1980).  First  tour  attrition;  laplica- 
tions  for  policy  and  research  (Researc^Report  1246).  Fort  Benjaain 
iiarrison,  IN:  Any  Research  Institute. 

Hicks,  J.  H.  (1981,  Harch).  Trends  in  first-tour  amed  services  enlisted 
attrition  rates.  Paper  presented  at  the  Annual  Meetings  of  the 
Southeastern  Psychological  Association.  Atlanta,  GA. 

Riegelhaupt,  B.  J.  &  Bonezar,  T.  P.  (1985,  October).  The  utility  of  educa¬ 
tional  and  biographical  infomstloo  for  predicting  ailiury  attrition. 
Proceedings  of  the  27th  Annual  Heeting  of  the  Military  Testing  Associa¬ 
tion.  San  Diego,  CA 

Uslker,  C.  B.  (1984,  Noveaber).  Validating  the  Amy's  Military  Applicant 
Profile  against  an  expanded  criterion  space.  Proceedings  of  the  26th 
Annual  Meeting  of  the  Military  Testing  Association.  Munich,  FRG. 

Valker,  C.  B.  (1985,  October).  Three  variables  that  nay  influence  the 
validity  of  biodata.  Proceedings  of  the  27 th  Annual  Meeting  of  the 
Military  Testing  Association.  San  Diego,  CA. 


121 


ASVAB  VALIDITIES  USING  IMPROVED  JOB  PERFORMANCE  MEASURES 


Lauress  L.  Wise 
Jeffrey  J.  McHenry 
American  Institutes  for  Research 


Paul  G.  Rossmeissi 
U.S.  Army  Research  Institute 

Scott  H.  Oppler 

American  Institutes  for  Research 


Presented  on  Symposium, 

"Project  A  Concurrent  Validation:  Preliminary  Results" 

At  the  Annual  Conference  of  the 
Military  Testing  Association 
Mystic,  Connecticut 

November  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


123 


ASVAB  VAUDITIES  USING  IMPROVED  JOB  PERFORMANCE  MEASURES 


Lauress  L.  Wise.  Jeffrey  J.  McHemy  -  American  Institutes  for  Research 
*Paul  G.  Rossmeissl  •  Army  Research  Institute 
**Scott  H.  Oppier  •  American  Institutes  for  Research 


Project  A  job  performance  measures  are  unique  in  their  combination  of  depth 
(work  samples,  ratings,  knowledge  tests,  and  administrative  measures)  and  breadth 
(19  very  diverse  jobs).  This  paper  examines  the  validity  of  the  Army's  ASVAB 
Aptitude  Area  (AA)  Composites  for  predicting  job  performance  as  assessed  by  these 
new  measures.  Project  A  performance  measures  have  been  organized  into  five 
constructs  (Wise,  Campbell,  McHenry.  Hanser.  1986).  Four  of  these  constructs 
(General  Soldiering  Proficiency,  Effort  and  Leadership,  Personal  Discipline,  and 
Physical  Fitness  and  Military  Bearing)  are  tte  same  for  each  Military  Occupational 
Specialty  (MOS).  Armed  Forces  Qualifying  Test  (AFQT)  scores  and  other  selection 
criteria  (e.g.  high  school  graduation,  moral  and  pl^ical  requirements)  are 
designed  to  predict  performance  on  these  common  constructs.  The  fifth  construct. 
Core  Technical  Proficiency  (CTP),  covers  aspects  of  job  performance  unique  to  each 
MOS  AA  scores,  used  as  job  specific  selection  criteria,  are  appropriately 
validated  against  this  construct. 

In  addition  to  evaluating  current  AA  composites,  we  identified  specific 
alternative  composites.  We  did  not  identify  alternative  composites  for  every  MOS, 
since  we  had  data  for  only  19  of  the  more  than  250  entry-level  MOS.  Instead,  we 
identified  alternative  composites  for  each  cluster  of  jobs  that  currently  use  the 
same  AA  composite.  In  this  paper,  we  only  considered  redefining  the  existing 
composites.  We  did  not  consider  changing  the  assignment  of  MOS  to  qrecific 
composites. 


Methods 

Current  forms  of  the  ASVAB  generate  nine  subtest  scores:  General  Science 
(GS),  Arithmetic  Reasoning  (AR).  Verbal  (VE  combining  Work  Knowledge  and  Paragraph 
Comprehension).  Coding  Speed  (CS),  Numerical  Operations  (NO),  Auto/Shop  Information 
(AS),  Mathematics  Knowledge  (MK).  Mechanical  Comprehension  (MC),  and  Electronics 
Information  (El)-  AA  composites  are  defined  as  unweighted  sums  of  four  or  fewer  of 
the  standardized  subtest  scores.  There  are  255  such  possible  composites  (126  using 
four  subtests.  84  using  three,  36  using  two.  and  9  using  a  single  subtest).  We 
evaluated  all  of  them. 

Project  A  Concurrent  Validation  (CV)  data  were  used  in  evaluating  the  current 
composites.  The  CV  data  included  the  new  job  performance  measures  applied  to  over 
9,000  soldiers  in  19  different  MOS.  Table  1  shows  CV  sample  sizes  by  MOS  and  race 
and  gender  and  also  the  ASVAB  subtest  and  the  CTP  criterion  means  and  standard 
deviations. 


This  research  was  funded  by  the  U.S.  Army  Research  Institute  for  the  Behavioral  and 
Social  Sciences,  Contract  Number  MDA9(B-82-C-053l.  Statements  expressed  in  this 
pBper  are  those  of  the  authors  and  do  not  necessarily  reflect  the  official  opinions 
or  policies  of  the  U.S.  Army  Research  Institute  or  the  Department  of  the  Army. 

•  Dr.  Rossmeissl  is  now  with  Hay  Systems.  Inc.  in  Washington.  D.C. 

••  Mr.  Oppier  has  returned  to  graduate  work  at  the  University  of  Minnesota 


124 


Four  separate  criteria  were  used  in  evaluating  current  and  alternative 
composites:  (1)  predictive  validity.  (2)  fairness  to  Blacks  and  females,  (3) 
das^cation  effidency,  and  (4)  face  v^dity.  Each  is  descnbed  briefly  before 
proceeding  to  a  discussion  of  the  results. 


e  Validity.  The  correlation  of  each  composite  with  the  CTP  score  was 
adjusted  for  restriction  of  range  due  to  eq>lidt  selection.  A  multivariate 
correction  due  to  Lawiey  (Lord  &  Novick,  1968,  p.  146)  was  used  with  each  of  the 
ASVAB  subtests  treated  as  a  separate  selection  variable.  The  result  was  tised  as 
the  measure  of  predictive  validity.  No  adjustment  was  made  for  "shrinkage”  in 
cross-validation  since  separate  regression  coefBdents  were  not  estimated.  For 
evaluation  of  the  current  composites,  this  is  entirely  appropriate.  Because  we  did 
pick  among  a  large  number  of  alternative  composites  on  the  basis  of  the  data  at 
hand,  some  shrinkage  should  be  expected  for  the  alternatives  that  appear  most 
extreme.  Conventional  shrinkage  formulas  do  not  handle  this  situation,  so  our  best 
approach  is  to  be  somewhat  conservative  in  adopting  new  alternatives  to  the 
existing  composites. 


Fairness  to  Blacks  and  Females.  Separate  regression  equations  were  computed 
by  race  and  gender  where  there  were  at  least  SO  examinees.  Both  slope  and 
intercept  differences  were  identified.  A  sin^e  overall  measure  of  the  difference 
in  the  separate  equations  was  defined  in  terms  of  the  expected  criterion  difference 
for  an  AA  score  of  100  (the  estimated  1980  norm  population  mean.)  Since  selection 
cutoffs  varied  between  85  and  1 10  for  the  MOS  in  question,  a  score  of  100  was 
selected  as  being  in  the  heart  of  the  critical  region  for  evaluating  the  selection 
fairness  of  alternative  composites.  Differences  in  the  prediction  equations  at 
points  significantly  below  or  above  this  value  would  have  little  impact  on 
determination  of  applicant  qualification.  The  difference  in  predicted  values  was 
converted  to  a  l  score  by  dividing  by  the  standard  error  of  the  estimate  of  the 
difference  (Pothoff,  19M). 


riassificafioTi  Efficiency.  The  Brogden  index,  defined  as  the  square  root  of 
the  average  validity  times  the  square  root  of  one  minus  the  average  of  the 
intercorrelations  among  the  composites  was  used  as  a  measure  of  classification 
efficiency.  This  statistic  is  an  indicator  of  the  accuracy  of  predictions  of 
differences  in  an  individual’s  expected  performance  across  jobs. 

Face  Validity.  The  final  evaluation  factor  was  ^e  validity.  Face  validity 
is  not  easily  quantifiable,  but  is  more  appropriately  used  as  a  check  of  the 
"reasonableness"  of  the  results.  It  is  our  attempt  to  check  purely  empirical 
results  against  some  conception  of  theory.  We  would  be  uncomfortable,  for  example, 
with  results  indicating  that  AS  is  an  important  predictor  for  clerical  jobs,  but 
quite  comfortable  with  AS  as  an  important  predictor  for  vehicle  mechanics. 


Results 


Table  2  shows  validities,  Brogden  indices  (Qss.  Eff.),  and,  where 
appropriate,  race  and  gender  i  statistics  for  each  contending  AA  composites. 
Separate  statistics  are  shown  for  each  applicable  MOS  and  unweighted  averages  of 
the  validities  and  i  statistics  are  shown  for  the  cluster  as  a  whole.  Each  row  of 
statistics  corresponds  to  a  different  composites.  The  first  row  gives  statistics 
for  the  current  composite.  Rows  with  data  on  alternative  composites  are  labelled 
A1  through  A9.  Data  are  also  shown  for  the  CL  and  SC  composites  replaced  in  1984 
after  our  prior  analyses  (McLaughlin,  Rossmeissl.  Wise.  Brandt.  &  Wang.  1984)  with 


125 


the  previous  composites  labelled  PR.  Where  some  other  of  the  current  composites 
has  a  higher  average  validity  than  the  operational  composite  the  cluster,  data  are 
shown  in  rows  that  are  labelled  according  to  the  other  composite.  The  results 
presented  in  Table  2  are  discussed  separately  for  each  of  the  current  AA 
composites. 

Clerical  fCL).  The  current  CL  composite  has  a  higher  average  validity  than 
any  alternative.  It  does,  however,  underpiedict  female  performance  in  the  two 
clerical  specialties  where  separate  predictions  were  generated.  The  addition  of 
either  NO  or  CS  significant  reduces  the  underpredktion  for  females  without 
significantly  reducing  validity.  Adding  NO  reduces  underprediction  the  most,  while 
adding  CS  has  the  greatest  face  validity  and  results  in  slightly  greater 
classification  efficiency.  A  slightly  different  pattern  was  found  for  76W.  The 
addition  of  AS  increases  validity  for  predicting  76W  performance,  while  decreasing 
validity  for  predicting  71L  and  76Y  performances.  Notwithstanding  these 
differences,  the  current  and  primary  alternative  CL  composites  predict  performance 
in  all  three  clerical  MOS  quite  well. 

Combat  fCOV  The  current  CO  has  high  validi^  each  of  the  MOS  examined.  Some 
gain  in  validity  would  be  realized  by  substituting  GS  for  CS  and,  perhaps,  also 
swapping  MK  for  AR.  The  inclusion  of  GS  would  improve  prediction  in  all  three  MOS. 

The  greater  contribution  of  GS  also  is  rational  in  light  of  increasing  technical 
sophistication  in  the  systems  used  in  combat  specialties.  Adding  GS  would  also 
reduce  the  small  degree  of  oveiprediction  of  the  performance  of  Blacks. 

Electronic  (ELI.  The  current  EL  composite  does  quite  well  for  the  one  EL 
specialty  examined.  Substitution  of  NO  for  one  or  both  of  the  quantitative 
subtests  would  increase  both  predictive  validity  and  classification  efficiency,  but 
not  to  any  practical  extent 

Field  Artillery  (FA).  Neither  the  current  FA  nor  any  alternative  appears  to 
have  a  very  high  validity  for  predicting  13B  performance.  Consideration  of 
alternative  composites  is  motivated  by  the  fact  that  several  other  current 
composites  have  higher  validities  for  predicting  13B  performance  than  the  current 
FA  composite.  Substitution  of  NO  and  AS  for  CS  and  MK  would  yield  the  most 
significant  gains.  Such  substitution  also  significantly  reduces  overprediction  for 
Blacks. 


General  Maintenance  (GM).  Very  high  validities  were  found  for  the  current  GM 
composite  for  both  5  IB  and  SSE  Very  sii^t  gains  might  result  firom  substituting 
VE  for  El  or  from  simply  dropping  El,  but  these  gains  would  be  offset  by  small 
increases  in  overprediction  of  Blacks’  performance  and  slightly  lower 
classification  efficiency  estimates. 

Mechanical  Maintenance  fMML  High  validities  were  found  for  the  current  MM 
composite  in  predicting  both  63B  and  67N  performance.  Small  gains  in  the 
prediction  of  63B  performance  and  increased  classification  efficiency  would  result 
from  dropping  the  NO  subtest. 

Qperators/Food  (OF).  The  OF  results  closely  parallel  the  CL  results.  Female 
performance  is  significantly  underpredicted  for  94B.  Another  specialty,  64C  shows 
a  somewhat  different  pattern  of  validities,  with  AS  again  (and  not  surprisingly) 
adding  significantly  to  the  predictive  validity  of  this  one  specialty.  In  fact, 
the  same  composites  appear  optimal  for  both  the  CL  and  OF  MOS  ~  AR+VE+MK+NO  for 
16S  and  94B  (as  for  71L  and  76Y)  and  AR^VE+MK+AS  for  64C  (as  for  76W). 

Substituting  AR  and  MK  for  AS  and  MC  would  significantly  reduce  underprediction  of 


126 


finnale  performance  for  94B  while  increasing  overall  validity. 

Surveillance  and  Communication  fSO.  A  high  predictive  validity  was  found  for 
the  current  SC  composite.  Some  gain  in  >^dity.  along  with  a  sli^t  increase  in 
classification  efficie^.  would  result  if  MC  were  replaced  by  NO.  This  would  lead 
to  a  small  increase  in  the  underprediction  of  performance  for  Blacks.  IfMKwere 
also  substituted  for  AR.  the  same  gains  in  validity  and  classification  efficiency 
could  be  obtained  along  with  a  decrease  in  underprediction  of  Blacks'  performance. 

Skilled  Technical  fSTl.  The  current  ST  is  a  true  Army  conqiosite  -  it  is  all 
that  it  can  be.  It  has  a  higher  average  validity  than  any  possible  alternative, 
and  it  shows  no  significant  difiierences  in  the  prediction  of  performance  for  Blacks 
and  females. 


Summary 

The  Army’s  existing  AA  composites  were  found  to  have  very  high  validity  for 
predicting  job*spe^c  performance  as  assessed  with  the  Project  A  measures.  A  few 
changes  to  the  existing  AA  composites  to  improve  validity  or  reduce  gender 
differences  were  identified  for  forther  consideration.  Specific  reconunendations 
are: 

CL:  Add  NO  to  reduce  gender  differences. 

CO:  Replace  GS  with  CS  to  increase  validity/reduce  race  differences. 

FA:  Replace  CS  and  MK  with  NO  and  AS  to  increase  validity. 

MM:  Drop  NO  to  increase  validity. 

OF:  Replace  NO  and  MC  with  AR  and  MK  to  increase  validity. 

Reassign  94B  (and  similar  MOS)  to  CL  to  reduce  gender  differences. 

SC  Replace  AR  and  MC  with  MK  and  NO  to  inaease  validitiy  and  reduce 
race  differences. 

Recommendations  for  further  analyses  include:  (1)  investigation  of 
criterion  factors  associated  with  low  ASVAB  correlations  for  the  13B 
measures  and  significant  gender  differences  for  71L  and  94B  and  (2) 
evaluation  of  alternative  assignment  of  MOS  to  composites,  particularly 
for  the  CL  and  OF  composites. 

References 

Lord,  F.  &  Novick,  M.  (1968).  Statistical  theories  of  mental  test 
scores.  Reading,  MA:  Addison- Wesley. 

McLaughlin,  D.,  Rossmeissl,  P.,  Wise,  L.,  Brandt,  D.,  &  Wang,  M.  (1984). 

Validation  of  current  and  alternative  ASAB  Area  Composites,  based  on 
trainina  and  SOT  information  on  FY  1981  and  FY  1982  Enlisted 

Accessions  (Technical  Report  651).  Alexandria.  VA:  U.  S.  Army 
Research  Institute. 

Potthoff,  R.  (1964).  On  the  Johnson-Newman  technique  and  some 
extensions  thereof.  Psvchometrika.  241-245. 

Wise,  L,  dlampbell,  J.,  McHenry,  J.,  &  Hanser,  L.  (1986,  August).  A 
latentJtructure  model  of  iob  performance  factors.  Paper  presented  at 
the  annual  meeting  of  the  American  Psychological  Association. 


127 


Table  1.  Descriptive  Statistics 


IMFANTRY  CO 
COMBAT  ENG  CO 


CANNON  CREW  FA 


NANPAD  CREW  OF 


ARMOR  CREW  00 


TOU/DRG  REP  EL 
RADIQ/TTY  SC 


CRPNT/MSNRT  GM 
NPC  SPEC  ST 


AMMO  SPEC  CM 


VEHCU  NECH  MM 


MOTOR  TRANS  OF 


HELCPTR  REP  Ml 

admin  aERK  a 


PETRO  SPPLT  a 


UNIT  SUPPLY  a 


KDIC  SPEC  ST 


FOOD  SERVCE  OF 


ifisas  JL  SIE  ^ -AS  J£  J3  JS  JS  JB  JS -El  £!E  B  Mfi  AS  !S  £  £1 

ALL  491  514  529  539  519  525  515  557  515  551  533  80  80  73  63  64  65  76  78  75  73 

ALL  5U  509  506  527  496  510  499  555  502  539  522  96  86  70  71  66  60  81  77  83  76 

BLACK  108  453  433  482  440  495  489  479  460  478  473  78  66  47  58  65  54  62  47  58  57 

UNITE  385  529  533  542  519  514  501  584  515  559  539  94  77  70  62  66  61  65  80  81  73 

ALL  464  510  487  519  488  516  497  514  495  509  502  85  87  69  70  61  65  91  66  83  78 

BLACK  168  485  438  491  456  516  493  458  478  466  467  84  76  60  68  59  68  71  55  70  67 

UNITE  250  528  528  544  518  518  SOI  563  507  546  533  82  73  65  57  61  63  74  71  74  69 

ALL  338  516  509  519  SOS  527  498  548  495  531  527  94  81  79  66  64  76  81  77  84  76 

BLACK  89  494  449  469  460  540  489  481  464  477  484  78  77  57  52  60  75  67  55  67  60 

UNITE  232  524  534  541  524  522  500  578  510  553  546  99  71  77  62  65  76  69  81  79  74 

ALL  394  514  527  536  513  515  506  567  515  549  535  75  84  73  69  66  67  79  77  78  80 

BLACK  71  469  459  497  465  499  483  497  477  488  474  69  77  56  64  65  67  66  63  63  63 

UNITE  297  524  548  547  530  517  511  588  525  568  553  75  75  74  60  65  66  70  76  73  76 

ALL  123  505  540  552  524  518  504  561  532  548  560  101  66  62  58  69  68  75  69  72  70 

ALL  289  508  518  540  521  554  547  547  521  527  514  85  76  72  59  54  60  80  79  86  79 

BLACK  74  488  461  494  494  564  557  498  493  479  482  69  68  71  60  U  66  70  63  77  64 

UNITE  204  513  538  555  532  550  542  565  529  543  525  89  68  66  56  56  56  77  82  84  82 

ALL  69  513  508  510  497  505  481  555  491  536  533  101  72  72  60  70  66  70  67  76  64 

ALL  340  507  540  543  529  517  503  543  533  543  531  99  71  73  57  70  69  82  74  72  76 

BLACK  84  466  505  516  515  508  482  485  516  500  493  98  66  69  55  70  72  63  64  55  68 

UNITE  223  522  558  554  541  520  511  571  538  562  549  95  64  74  52  71  67  74  76  70  71 

ALL  203  507  497  495  475  491  476  526  481  490  523  97  64  65  62  64  68  69  60  76  57 

BLACK  75  472  477  469  458  492  475  486  470  451  516  99  48  56  53  61  69  52  43  56  44 

UNITE  112  531  513  513  486  491  475  556  486  519  527  89  69  63  65  68  70  66  70  78  64 

ALL  478  513  506  528  496  520  509  579  501  543  536  76  78  71  62  63  59  78  69  79  65 

BLACK  78  464  as  478  456  520  491  510  476  479  503  72  M  59  M  61  63  70  54  57  52 

UNITE  374  526  522  541  507  519  513  598  508  559  5a  70  72  69  57  63  59  69  71  75  M 

ALL  507  510  486  498  ai  513  499  Sa  483  522  509  72  75  76  63  65  67  75  68  76  71 

BLACK  121  487  4a  456  450  523  492  498  456  471  471  73  65  60  54  61  69  70  54  65  74 

UNITE  358  520  502  513  493  508  501  568  493  541  523  71  77  62  M  65  68  69  72  M 

FEMALE  52  495  485  SOS  520  554  559  AM  490  480  454  71  73  78  55  65  67  65  61  72  54 
MALE  455  512  486  498  477  509  492  558  483  526  515  72  75  76  63  63  M  70  69  75  70 
ALL  238  510  567  567  5a  550  531  613  550  601  582  95  60  59  47  53  63  54  67  54  57 
ALL  427  506  493  528  514  562  552  476  515  484  481  87  82  72  59  49  61  79  75  79  69 
BLACK  159  491  4a  499  495  563  535  4a  498  454  4a  81  74  65  59  45  63  61  69  71  55 
UNITE  235  516  518  sa  531  560  560  502  528  505  494  89  79  70  51  52  58  a  75  79  74 
FEWILE  237  524  486  519  522  Sa  561  a7  508  ai  465  72  73  67  49  50  63  a  a  tf  52 
MALE  190  483  502  539  505  558  Sa  514  524  512  501  98  91  76  M  a  57  82  a  83  82 
ALL  339  519  479  511  494  536  512  508  491  500  498  95  90  74  69  54  65  99  72  91  81 
BLACK  139  476  430  472  463  539  500  a7  ai  4a  ai  88  73  63  65  52  a  73  60  a  67 
UNITE  174  551  521  539  522  535  518  560  514  Sa  530  88  85  69  60  55  a  90  73  a  78 
ALL  4a  516  489  518  500  550  531  496  507  496  496  93  85  74  67  51  58  86  75  a  78 
BLACK  169  487  442  479  473  553  518  455  473  453  463  90  69  62  60  a  54  71  60  65  63 
UNITE  231  536  528  547  524  547  538  532  530  530  524  93  76  71  60  56  61  83  78  83  78 
FEMALE  75  519  463  SOI  492  569  551  429  494  4a  453  M  73  62  59  a  61  57  71  72  62 
MALE  369  516  494  522  501  Sa  527  510  509  506  SM  95  87  76  M  51  57  a  76  83  78 
ALL  392  514  547  sa  sa  525  520  528  530  sa  524  79  62  a  a  69  70  82  71  70  M 
BLAtt  91  486  519  512  521  519  508  486  511  495  496  72  50  58  42  74  62  70  65  54  58 
UNITE  2a  525  562  555  550  527  524  sa  538  sa  534  a  61  a  42  a  71  tt  72  tt  70 
FEMALE  116  513  532  545  542  550  549  465  Sa  SM  475  81  59  59  a  58  65  M  M  a  52 
MALE  276  515  554  5a  539  514  508  555  525  559  Sa  79  a  M  a  71  a  73  72  67  a 
ALL  3a  526  496  515  503  533  510  516  495  510  503  90  M  77  a  a  a  82  72  76  75 
BLACK  124  493  a9  4a  471  534  501  469  4a  4a  471  77  70  58  56  a  72  a  51  56  a 
WHITE  222  5a  524  543  524  532  517  Sa  515  536  524  94  74  73  58  a  a  79  73  75  74 


FEMALE  78  553  474  499  513  562  5a  4a  489  a7  4a  79  M  65  53  57  82  a  M  a  59 

MALE  290  519  502  519  501  526  501  534  497  522  518  92  79  a  65  a  62  77  74  75  72 

95B:  MIL  POLICE  ST  ALL  597  504  562  554  542  530  519  573  537  571  550  74  53  a  42  62  62  a  61  58  62 


128 


Table  2.  ValidiQr,  Cultural  Fairness,  and  Qassification  Efficient 
Indicators  for  Current  and  Other  ASVAB  Composites 


Currant/Othcr  Avg.  Avg  t  Avg  i  Ct«w  1  by  t  by 

Iby  iby 

i  by 

1  by 

CcwPO«<t«i  yii  ggsa 

^  ElU 

V6l  Rf  S»x 

M  Biss  »•« 

VSi  BESS 

Bu 

TIL:  ADMIN  SPEC 

76M;  PETRO  SPPLY 

76Y:  UNIT  SUPPLY 

a:  AR^VE-HiC  .661  -2.2 

16.1  .231 

.64  .6  20.4 

.67  -5.8 

.67  -1.4 

11.8 

PR:  VE4N0*CS  .578  -5.7 

3.1  .248 

.59  -.2  5.6 

.55  -12.8 

.60  -4.3 

.5 

A1:  AR»^«NO*MK  .656  -3.1 

6.7  .232 

.65  .4  10.6 

.65  -7.8 

.67  -2.0 

2.9 

A2:  A»»VE-K»HtC  .656  -2.2 

8.1  .233 

.65  1.6  11.4 

.65  -7.0 

.67  -1.1 

4.9 

A3:  AR^VE-»AS«NC  .655  -.5 

22.2  .222 

.60  1.0  32.2 

.70  -2.0 

.67  -.4 

12.3 

CO;  COMBAT 

11B:  INFANTRYMAN 

12B;  COMBAT  ENG 

19E!  ARMOR  CREW 

CO:  A»KS4AS*MC  .617  -3.2 

•  .231 

.66 

.64  -3.5 

.55  -3.0 

. 

A1:  6S«AS«MK«MC  .648  -1.9 

-  .229 

.67  - 

.67  -2.9 

.60  -1.0 

GN:  6S»AfrH«C4€l  .641  -2.5 

-  .230 

.67 

.67  -3.5 

.58  -1.5 

A2:  6$*m*AS  .643  -2.4 

-  .230 

.67 

.67  -3.3 

.59  -1.4 

. 

EL:  ELECTRORONIC 

27E;  TOH/DRGN  REP 

EL:  6S»AR-HK-»EI  .779 

-  .231 

.78 

A1:  GSHKHEl  .791 

-  .235 

.79 

A2:  6S*N0*MK«EI  .791 

-  .232 

.79 

PA:  FIELD  ARTILLERY 

138:  CANNON  CREW 

FA:  AR4CS-»MK4MC  .341  -8.4 

-  .231 

.34 

A1:  OSHKMSHK  .383  -3.1 

-  .227 

.38 

A2:  AIMIO»AS«MC  .381  -3.8 

-  .227 

.38 

GM:  GENERAL  MAINTENANCE 

SIB:  CRPNT/NSNRY 

55B:  Am)  SPEC 

GN:  GS«AS«MK^GI  .785  -5.0 

•  .231 

.81 

.76  -5.0 

A1:  6S«VE-»AS«MK  .798  -6.3 

•  .229 

.84 

.75  -6.3 

A2:  GS«AS«MK  .791  -6.4 

-  .230 

.84 

.74  -6.4 

A3:  GS«AR«VE«AS  .789  -4.5 

-  .228 

.82 

.76  -4.5 

A4:  GS«CS-HIS«MK  .789  -10.0 

-  .229 

.86 

.72  -10.0  • 

WM;  MECHANICAL  MAINTENANCE 

638:  VEHaE  MECH 

67N;  HELCPTR  REP 

MM:  NO»AS«MC>El  .729  -4.7 

-  .231 

.66  -4.7 

.80 

A1:  AS^+EI  .745  -4.5 

.  .240 

.69  -4.5 

.80 

A2:  GS^AS^MC^I  .742  -4.4 

-  .233 

.68  -4.4 

.81 

A3:  AS«MK4MC^I  .739  -5.6 

-  .229 

.67  -5.6 

.81 

A4:  AIWe*AS*EI  .739  -4.3 

-  .230 

.67  -4.3 

.81 

AS:  GS^AS-HK  .738  -3.9 

-  .234 

.67  -3.9 

.81 

A6:  At»MC  .733  -3.5 

-  .244 

.68  -3.5 

.79 

OF;  OPERATORS/FOO) 

16S:  NANPAD  CREW 

64C:  MOTOR  TRANS 

948;  FOOD  SERVCE 

OF:  VE«NO»AS«MC  .538  -1.0 

8.4  .231 

.44  .9 

.52  -1.4  -4.6 

.65  -2.5 

21.3 

A1:  AR^VE+AS^  .571  .8 

9.0  .228 

.51  3.0 

.53  -.5  -14.1 

.68  -.2 

32.1 

A2:  GS^AR^AB^MK  .568  .5 

10.7  .228 

.50  2.9 

.54  -.1  -4.8 

.67  -1.5 

26.2 

A3;  AR*A8MRC  .567  -.2 

12.3  .230 

.49  2.1 

.54  -1.1  -2.3 

.66  -1.4 

26.9 

A4:  GS^AR-MRC  .561  -1.1 

10.0  .232 

.52  2.2 

.49  -3.6  -15.5 

.68  -1.9 

35.5 

A5:  GS^AR^VEHK  .561  -.8 

13.3  .231 

.52  2.7 

.48  -3.7  -17.6 

.69  -1.5 

44.1 

A6:  AR*VE-MRC  .558  -1.4 

13.2  .228 

.52  2.0 

.46  -5.2  -19.0 

.69  -1.2 

45.4 

A7:  AIN-VE-MK4MC  .566  -.4 

6.4  .234 

.50  1.7 

.51  -2.4  -24.7 

.69  -.6 

37.6 

A8:  AR»¥E*<MWK  .548  -4.8 

•1.8  .236 

.51  -.1 

.44  -10.8  -16.5 

.70  -3.4 

13.0 

A9:  AR^V8*CS4IK  .546  -3.2 

2.3  .236 

.51  .2 

.U  -6.9  -14.7 

.70  -2.9 

19.3 

EL:  6S»AR4MK«EI  .558  -.8 

9.4  .228 

.50  2.1 

.51  -2.0  -7.1 

.66  -2.6 

25.8 

ST:  6S»VE4MC«flC  .557  .6 

7.1  .228 

.50  -1.4 

.51  1.9  -16.9 

.66  -3.0 

31.1 

FA;  AR*CS-HRt4EI  .555  -2.9 

6.3  .230 

.49  -.7 

.49  -5.1  -22.6 

.69  -3.0 

35.3 

SC;  SURVEILLANCE  8  COMMUNICATION 

31C:  RADIO/TTT  OP 

SC:  AR«VE<»ASMC  .693  1.9 

-  .231 

.69  1.9 

PR:  VE*«0*CS*AS  .701  .5 

-  .232 

.70  .5 

A1:  AR^VE-HNHAS  .729  2.4 

•  .233 

.73  2.4 

A2:  VE«NO»AS«MK  .729  .9 

-  .233 

.73  .9 

A3!  AR^VE-HIOHII  .728  1.2 

-  .234 

.73  1.2 

A4:  6S«AR«NO»€I  .727  2.0 

-  .232 

.73  2.0 

STl  SKILLED  TECHNICAL 

54E:  NBC  SPEC 

91A:  MEDIC  SPEC 

95B:  MIL  POLICE 

ST:  GS*'VE*MK*MC  .683  -1.5 

.1  .231 

.69  -1.6 

.73  -1.3  .1 

.63 

. 

A1:  GS*CS*AS*Mr  .679  -1.1 

.5  .231 

.67  -1.5 

.75  -1.5  .5 

.62 

. 

VALIDATION  ANALYSIS  FOR  NEW  PREDICTORS 


John  P.  Canpbe11 

Human  Resources  Research  Organization 


Presented  at  a  Data  Analysis  Workshop  of  the 
Committee  on  Performance  of  Military  Personnel 

Baltimore 

December  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


131 


This  is  a  working  paper  preoared  for  a  conference  on  performarce 
measurenent  sponsored  by  the  Committee  on  the  Performance  of  Military 
Personnel  of  the  National  Research  Council.  It  reports  some  of  the  initial 
selection  test  validation  data  collected  by  the  Arne's  Selection  and 
Classification  Project  (Project  A).  These  data  should  still  be  viewed  as 
preliminary  and  should  not  be  cited,  quoted,  or  distributed. 

After  briefly  summarizing  the  objectives  of  the  project,  the  basic  data 
collection  design,  and  the  steps  used  in  the  development  of  the  new  selection 
tests  and  criterion  measures,  the  intitial  validity  results  from  the 
concurrent  sample  will  be  presented. 

Much  of  the  background  information  that  outlines  the  objectives,  sar’c’a 
characteristics,  selection  test  development,  and  oerformance  measure 
develcoment  is  taken  from  a  1936  APA  paper  (Campbell,  1986)  that  gave  an 
overview  of  the  entire  project.  Most  of  the  results  that  are  presented  were 
generated  by  analyses  done  since  that  time. 


Overall  Project  A  Objectives 

Project  A  is  directed  at  multiple  operational  and  research  objectives. 
The  major  ones  are  shown  in  Table  1. 


The  current  Army  selection/classification  system  for  enlisted  personnel 
screens  300  to  400  thousand  people  each  year,  selects  120-140  thousand  of 
them,  and  assigns  each  Individual  to  one  of  approximately  275  entry  level 
positions.  The  primary  selection  instrument  is  the  Armed  Services  Vocational 
Aptitude  Battery  (ASVA8)  which  currently  has  ten  subtests  and  composites  o* 
subtests  developed  for  different  categories  of  occupational  specialties. 
Cutting  scores  have  been  established  for  each  job,  or  Military  Occupational 
Speciality  (MOS),  and  if  the  individual  is  above  the  cutting  score  on  the 
appropriate  ASVAB  composite,  assignments  are  made  on  the  basis  of  Army  needs, 
training  space  availability,  and  individual  preferences.  A  system  of  bonuses 
is  currently  in  use  to  influence  Individual  preferences  in  the  direction  of 
Army  needs. 

The  mandate  of  Project  A  is  to  develop  an  experimental  battery  of  new 
selection/classification  instruments,  validate  them  against  appropriate 
measures  of  job  performance,  assess  their  collective  differential  validity 
for  making  classification  decisions,  and  provide  the  information  necessary 
for  conducting  "what  if"  games  with  differential  weights  for  job  assignments, 
changes  in  cutting  scores,  quotas,  etc.  The  latter  activity  would  be  carried 
out  in  conjunction  with  the  assignment  algorithms  developed  by  Project  3. 

In  the  course  of  trying  to  meet  this  mandate.  Project  A  has  taken  a 
broad  approach.  For  example,  we  have  tried  to  provide  a  systematic 
description,  in  a  taxonomic  sense,  of  the  universe  of  information  that  is 
potentially  useful  for  making  predictions  of  future  job  performance  and  to 
develop  a  model  of  its  latent  structure.  Similarly  the  Project  has  tried  to 
develop  a  general  latent  structure  model  of  job  performance  for  entry  level 
skilled  jobs,  at  least  as  they  are  represented  by  the  population  of  jobs 
performed  by  enlisted  personnel  in  the  U.  S.  Army. 


132 


Table  1 


Army  Selection  and  Classification  Project 


?oer»tional  Objectives 

1)  Develop  new  measures  of  job  performance  that  can  be  used  as  criteria 
against  which  to  validate  selection/classification  measures. 

2)  Validate  existing  selection  measures  against  both  existing  and  prcject- 
cevelcped  criteria. 

3)  -evelco  and  validate  new  selection  and  classification  measures. 

4)  Develop  a  utility  scale  for  different  performance  levels  across  MOS. 

5)  -stimate  the  relative  effectiveness  of  alternative  selection  and 
classification  procedures  in  terms  of  their  validity  and  utility. 

Pesea**ch  Objectives 


1)  Identify  the  constructs  that  constitute  the  universe  of  information 
available  for  selection/classification  into  entry  level  sitilled  jobs. 

2)  Develop  a  general  model  of  performance  for  entry  level  skilled  jobs. 

3)  Investigate  the  construct  validity  of  the  "method"  variance  in  job 
performance  measures. 

4)  Describe  the  utility  functions  and  the  utility  metrics  that  individuals 
actually  use  when  estimating  "utility  of  performance". 

5)  Estimcte  the  degree  of  differential  prediction  across  (a)  major  domains 
of  predictor  information  (e.g.,  abilities,  personality,  interests),  (b) 
major  factors  of  job  performance,  and  (c)  different  types  of  jobs. 

6)  Determine  the  extent  of  differential  prediction  across  racial  and  gender 
groups  for  a  systematic  sample  of  individual  differences,  performance 
factors,  and  jobs. 

7)  Develop  new  statistical  estimators  of  classification  efficiency. 


133 


I'  at®nt  structure  o*  jcn  cer'arriance  and  the  taxoncr.ic  structure 
of  se^ecticn/classificatlon  prediction  information  is  modeled,  and  measures 
are  developed  to  assess  the  major  constructs  using  samples  of  soldiers  from  a 
representative  sample  of  jobs  *rnn  a  large  population  of  jobs,  then  a  number 
of  interesting  cuestions  can  be  examined  systematically.  For  example,  to 
what  extent  is  oifferential  prediction  possible  across  major  components  of 
performance?  To  what  extent  is  there  differential  prediction  across  the 
majo'"  comoonents  of  the  predictor  universe?  To  what  extent  does  validity 
general ization  across  jobs  ceoend  joon  the  performance  component  being 
assessed?  Fo-  any  di fferent* al  3re<iiction  across  race  and  gender  grbuos, 
wnat  15  the  sou'ce  of  sucn  ai'*erential  regressions  in  terms  of  predictcn 
ccnpcnent 'pe''f :""'ance  component  cpmci nati ons?  What  happens  to  the  overall 
rgdressidn  p’ctjre  wnen  those  ccmponents  are  omitted? 


Basic  Project  Design 

The  basic  design  of  Project  A  shown  in  Figure  1  is  simply  that  of  a  very 
large  test  validation  study  that  incorporates  several  independent  data 
Cbl lections. 

There  are  four  major  data  files.  The  first  consists  of  the  available 
computer  records  for  people  wno  joined  the  Army  in  1981  and  1982.  The  basic 
data  are  the  ASVAB,  the  available  training  school  grades,  and  the  Skills 
Qualification  Test  (SQT),  which  is  a  paper-and-penci 1  measure  of  current  job 
knowledge  constructed,  administered,  and  scored  by  the  individual's  unit 
ccixiand.  Complete  data  were  availaole  on  at  least  100  people  for  S3  of  the 
275  MOS. 

The  three  waves  of  new  data  collected  by  the  project  consist  of  (1)  a 
longitudinal  sample  called  the  preliminary  battery  sample  which  is  composed 
of  approximately  2,000  recruits  in  each  of  4  mOS,  (2)  a  major  concurrent 
validation  sample  composed  of  400-600  incumbents  in  each  of  19  MOS,  and  (3) 
an  even  larger  longitudinal  validation  sample  composed  of  over  40,000 
recruits  taken  from  21  MOS.  Besides  providing  different  kinds  of  validity 
information,  the  three  samples  were  intended  to  provide  the  opportunity  for 
multiple  revisions  of  the  new  predictor  battery.  The  preliminary  battery 
sample  was  assessed  with  a  four-hour  battery  of  carefully  selected 
off-the-shelf  tests  to  provide  a  set  of  marker  variables  for  the  project- 
developed  tests.  Approximately  one-fifth  of  this  sample  became  a  part  of  the 
concurrent  validation  sample,  which  was  the  first  time  the  full  array  of 
project-developed  tests  and  performance  measures  were  administered  together. 
Each  job  incumbent  in  the  concurrent  validation  sample  was  assessed  eight 
hours  each  day  for  two  days.  The  longitudinal  validation  builds  upon  the 
concurrent  findings  and  is  designed  to  yield  a  sample  of  400-600  per  MOS 
after  the  decay  rates  for  the  MOS  cohorts  have  their  effect.  To  produce  a 
sample  of  10,000  incumbents  at  the  time  of  job  performance  assessment, 
approximately  45,000  new  recruits  are  being  tested  on  the  predictor  battery. 

The  reenlistees  from  both  the  concurrent  sample  (83/84  cohort)  and  from 
the  longitudinal  sample  (86/87  cohort)  will  be  followed  into  their  second 


134 


tour  and  assessed  with  anotner  array  of  joo  ”.easj'‘es.  C^nrc  t"e 

second  tour  the  job  tasks  require  a  higner  level  of  s«ill  and  tne 
supervi  sor/leadershi D  component  becomes  much  more  promi'ient. 


Predictor  Development 

The  standard  operating  procedure  for  predictor  development  in  persoore'i 
selection  research  is  to  do  a  job  analysis  first.  On  the  basis  of  a  Jon 
analysis,  the  knowledges,  skills,  and  abilities  (KSA)  required  for  successful 
oerformance  are  inferred,  and  an  additional  judgment  is  then  made  about  wmcn 
K3A  a-e  tramaole  and  wnich  must  he  selected  for.  We  oidn’t  precise-y  do 
that  in  rroject  A, 


Instead,  the  strategy  was  to  identify  a  unive'*se  o*'  pote'’f;ai  onedctor 
constructs  acoropriate  for  the  population  of  enlisted  *^CS,  sample 
representatively  from  it,  construct  tests  for  each  construct  sampled,  and 
refine  and  improve  the  measures  through  a  long  series  of  pilot  and  field 
tests.  The  intent  was  to  develop  a  predictor  battery  that  was  maximally 
useful  for  an  entire  population  of  jobs  and  not  to  tailor-make  them  for  the 
specific  jobs  in  the  sample.  The  loss  In  specific  prediction  accuracy  for 
the  jobs  in  the  sample  (if  any)  should  be  compensated  for  by  the  gain  in 
coverage  for  all  other  jobs  in  the  population. 

The  long  process  of  predictor  development  is  represented  in  Figure  2. 

It  began  with  an  exhaustive  search  of  the  entire  personnel  selection 
literature.  Research  teams  were  created  for  cognitive  abilities,  perceptual 
and  psychomotor  abilities,  and  non-cogniti ve  characteristics  such  as 
temperament,  interest,  and  biographical  history.  Every  available  automated 
and  manual  technique  was  used  in  the  search  and  an  initial  list  of  several 
hundred  variables  was  compiled.  The  list  went  through  several  waves  of 
expert  reviews  and  eventually  came  down  to  a  list  of  53  potentially  useful 
predictor  constructs.  They  are  listed  in  Table  2. 

A  similar,  but  different,  procedure  was  used  to  identify  a  population  of 
performance  factors  -  72  in  all.  We  then  assembled  a  sample  of  35  personnel 
selection  experts  and  asked  them  to  estimate  the  correlation  between  each 
predictor  construct  and  each  criterion  factor,  when  that  correlation  was 
corrected  for  restriction  of  range  and  criterion  unrel iabi lity.  The 
resulting  judgments  were  analyzed  for  inter-judge  agreement,  rows  and  columns 
were  factor  analyzed,  and  the  results  were  compared  to  analogous  information 
from  the  empirical  literature.  Most  importantly,  however,  the  exercise 
provided  another  substantial  set  of  expert  judgments  about  which  predictor 
constructs  should  be  the  most  useful.  A  hierarchical  analysis  of  the 
predictor  validity  profiles  is  also  shown  in  Table  2. 


13C 


Figure  2.  Flow  CHart  of  Predictor  Measure  DeveloiMnent  Activities  of  Project  A. 


CSXSUUCIS 

ausiMS 

fACicas 

1.  y«rssi  Carcrmarwion 

S.  MMin*  CaWfMnsiM 

U.  Idtatianai  ntMxcr 
ia.  Ma(«4iea(  ttaaoninf 

21.  omlbua  IntalUiance/AvcItudc 
aa.  W8r«  riuawy 

A.  Varbal  ability/ 

Sonaral  Intalllfanco 

A.  War4  arabiM 

a.  iMuctt**  Itataninft  Cancapt  aaraatfan 

10.  Oaouettva  kafie 

1.  laaaoninf 

2.  Hwarfcat  Cowutatfan 

3.  Uaa  af  ramiiaymaMr  araMaaa 

e.  buitor  Ability 

CScatTivc 

AAILITICS 

12.  aarctotual  Sptad  and  Aaewracy 

0.  Parcaptual  Spaad  and  Accuracy 

A9.  Iiwaaclta:<«a  Incaraaia 

0.  tnaaatlaatlaa  Intaraata 

U.  tata  NaMarr 

17.  railaa  Sirtctlana 

4»  Mamary 

If.  riaurai  ItaaonJna 

23.  vtraal  M  ri  aural  CtaaiX'a 

r.  Ctaaw-a 

A.  rHa<aiawnaianat  Manta!  tatacian 

7.  tbraa*aiMnaianal  Mancal  tatation 
f.  Soaoai  viaualixacian 

11.  riaU  Oaeanoanea  (Haaac<«a) 

IS.  flaea  Maawrr  (Viauat  Maawr) 

20.  Spatial  S«annin« 

f.  Vlaualltaclon/Spatlal 

VtSUALITATlai/ 

SPATIAL 

24.  Praaaaaint  Ifflclancy 

23.  talaetiva  AttMion 

26.  TIm  Sharlnt 

0.  Nantal  tnfonmtlan  Praaaaaint 

luretMATtca 

mezsstMO 

13.  Mcnantcal  Canrafianalan 

40.  ttaiiatie  Inttraata 

SI.  Artlatie  Inttraata  (Naaativa) 

t.  NacAanieal  Cotbrananalan 

R.  loallatlc  ta.  Artistic 

Inttraata 

MCXAIItCAL 

28.  Cantral  Praafalan 

2f.  lata  Cantral 

32.  Arvhana  tttadinaaa 

34,  Alainf 

1.  Sttadinaas/frteisian 

27.  MiltlUf*  Caerflnatlon 

3S.  Spaaf  af  Ara  Mataaant 

0.  Coardinatlon 

PSTCiOnOTCA 

30.  Manual  Oaatarlty 

31.  Slnaar  Oaattrity 

33.  wrlat'llnaar  Spaad 

K.  Oaattrity 

3?.  SaelabUItY 

S2.  Social  {ntaraata 

0.  Sociability 

SOCIAL  SKILLS 

SO.  Inttrpriaioa  tntaraats 

1.  Cnttrpriaint  Inttraata 

36.  Invalvaiaant  In  AtMatlca  and  Itiyaical 

Canditleninf 

37.  Cnaror 

T.  Athletic  Abilltita/fnaray 

VI  cm 

41.  Oaaiintneo 

42.  Salf'tataaia 

4.  Oaminanct/Salf'tataaa 

40.  Traditional  Valuta 

43.  Canaeiantiouanaaa 

46.  Mon'dal Inquaney 

53.  Convantlonal  Inttraata 

0.  Traditional  Valuaa/Convantlon* 
al I tY/Non>dal inquaney 

U.  looia  a#  Control 

47.  Ifark  Oriantatlon 

0.  Uorfc  Orlantatfon/ioeua 
af  Control 

MTIVATIOM/ 

STAIILITT 

38.  Coooarativtnaaa 

4S.  (motional  Stability 

r.  Coeporatlcvimatlanal  Stability 

Table  2.  Hierarchical  Map  of  Predictor  Space 


138 


All  the  available  infornation  was  then  used  to  arrive  at  a  final  set  of 
variables  for  which  new  measures  would  be  constructed.  This  represented 
months  of  effort  by  lots  of  peoole  to  select  the  variables  that  would  best 
supplement  the  ASVAB  in  predicting  Job  performance  across  all  MOS.  What 
followed  were  many  months  more  of  instrument  construction,  several  waves  of 
pilot  tests,  and  a  series  of  major  field  tests.  Included  in  these  efforts 
were  the  development  of  a  compute'^ized  battery  of  perceptual /psychomotor 
tests,  the  creation  of  the  software,  the  design  and  construction  of  a  special 
response  pedestal  permitting  a  variety  of  responses  (e.g.,  one  hand  tracking, 
two  hand  coordination)  and  the  acquisition  of  74  portable  computerized 
testing  stations.  After  each  data  collection,  revisions  were  made  on  the 
basis  of  item  statistics  and  expert  review.  Finally  on  May  15,  198o,  the 
predicton  oattenv  was  ceened  ready  for  concurrent  validation.  That  battery, 
known  as  tne  Trial  Battery  'TBl,  is  listed  in  Table  3. 


Performance  Measurement 


The  goals  of  training  and  job  performance  measurement  in  Project  A  were 
to  define,  or  model,  the  total  domain  of  performance  in  some  reasonable  way 
and  then  develop  reliable  and  valid  measures  of  each  major  factor. 

Sene  additional  specific  goals  were  to:  a)  make  a  state-of-the-art 
attempt  to  develop  job  sample  or  "hands-on"  measures  of  job  task  proficiency, 
b)  conoare  hands-on  measurement  to  paper-and-penci 1  tests  ^nd  rating  measures 
of  proficiency  on  the  same  tasks  (i.e.,  a  multi-trait,  multi-method 
approach),  c)  develop  standardized  measures  of  training  achievement  for  the 
purpose  of  determining  the  relationship  between  training  performance  and  job 
performance,  and  d)  evaluate  existing  archival  and  administrative  records  as 
possible  indicators  of  job  performance. 

Given  these  intentions,  the  criterion  development  effort  focused  on 
three  major  methods:  hands-on  job  sample  tests,  multiple  choice  knowledge 
tests,  and  ratings.  The  behaviorally  anchored  rating  scale  (BARS)  procedure 
was  extensively  used  in  the  development  of  the  rating  methods. 


Modeling  Performance 

The  development  efforts  to  be  described  were  guided  by  a  particular 
“theory"  of  performance.  The  basic  outline  is  as  follows: 

First,  job  performance  really  is  multi-dimensional.  There  is  not  one 
outcome,  one  factor,  or  one  anything  that  can  be  pointed  to  and  labeled  as 
job  performance.  It  is  manifested  by  a  wide  variety  of  behaviors,  or  things 
people  do,  that  are  judged  to  be  important  for  accomplishing  the  goals  of  the 
organization. 


139 


Table  3 


Sunma^'y  Predictor  Measures  Used  in  Concurrent  Validation 

(The  Trial  Battery) 


CCGNI'IVE  P-?£S-AND-?£NCIL  TESTS  ‘lu-’her  of  Ite'^s 


"^est  *i?~e  (Construct  Name) 

Pe3S7n:ng  Test  (Induction-figural  reasoning)  30 
Orie^tat'on  Test  (Soatial  orientation)  Z-i 
'’ac  'est  .'Soatial  orientation)  20 
Object  Potation  Test  (Spatial  visualization  -  Potation)  90 
Psse~o’ing  Objects  Test  (Spatial  vi swal ization  -  Rotation)  32 
Maze  ~est  (Spat  al  visualization  -  seaming)  2-i 


COM? J"£R-ADMINISTER£D  TESTS  Nuniber  of  Ite^is 


~est  '«3ne  f Construct  Name) 

Simple  Reaction  Time  (Processing  efficiency)  15 
Choice  Reaction  Time  (Processing  efficiency)  30 
Memc'v  Test  (Short-term  memory)  36 
Target  Tracking  Test  #1  (Psychomotor  precision)  13 
Target  Shoot  Test  (Psychomotor  precision)  30 
Percsctual  Speed  and  Accuracy  Test  (Perceptual  speed  and  accuracy)  36 
Ide-ti'ication  Test  (Perceptual  speed  and  accuracy)  36 
Target  Tracking  Test  #2  (Two  hand  coordination)  13 
Nunbe''  Memory  Test  (Number  operations)  28 
Cannon  Snoot  Test  (Movement  judgment)  36 


NON-COGMITIVE  PAPER -AND-PENCIL  INVENTORIES 

Inventory  Name  and  Constructs  Number  of  Items 


Assessment  of  Background  and  Life  Experiences  (ABLE)  Inventory  209 

Adjustment 

Oependabi lity 

Achievement 

Physical  Condition 

Leadership 

Focus  of  Control 

Agreeableness/Likeabi lity 

Army  Vocational  Interest  Career  Examination  (AVOICE)  176 

Realistic  Interests 
Conventional  Interests 
Social  Interests 
Enterprising  Interests 
Artistic  Interests 


140 


Two  General  Factors 


For  the  population  of  entry  level  enlisted  positions  we  postulated  that 
there  are  two  major  types  of  job  performance  components.  The  first  is 
composed  of  components  that  are  specific  to  a  particular  job.  That  is, 
measures  of  such  components  would  reflect  specific  technical  competence  o“ 
specific  job  behaviors  that  are  not  required  for  other  joos.  The  second  '<ind 
of  performance  factor  includes  components  that  are  defined  and  measured  in 
the  same  way  for  every  job.  These  are  referred  to  as  Army-wide  criterion 
factors. 

For  the  job  specific  components,  we  anticipated  that  there  would  be  a 
relatively  small  number  of  distinguishable  factors  of  technical  performance 
that  would  be  a  function  of  different  abilities  or  skills  and  which  woula  he 
reflected  by  different  task  content. 

The  Army-wide  concept  incorporates  the  basic  notion  that  total 
performance  is  much  more  than  task  or  technical  proficiency.  It  might 
include  such  things  as  contributions  to  teamwork,  continual  self-development, 
support  for  the  norms  and  customs  of  the  organization,  and  perserverance  in 
the  face  of  adversity. 

In  sum,  the  working  model  of  total  performance  with  which  the  project 
began  viewed  performance  as  multi-dimensional  within  the  two  broad  categories 
of  factors.  The  job  analysis  and  criterion  construction  methods  were 
designed  to  "discover"  the  content  of  these  factors  via  an  exhaustive 
description  of  the  total  performance  domain,  several  iterations  of  data 
collection,  and  the  use  of  multiple  methods  for  identifying  basic  performance 
factors. 


Factors  vs.  a  Composite 


Saying  that  performance  is  multi-dimensional  does  not  preclude  using 
just  one  index  of  an  individual's  contributions  to  make  a  specific  personnel 
decision  (e.g.,  select/not  select,  promote/not  promote).  As  argued  by 
Schmidt  and  Kaplan  (1971)  some  years  ago,  it  seems  quite  reasonable  for  the 
organization  to  scale  the  .importance  of  each  major  performance  factor 
relative  to  a  particular  personnel  decision  that  must  be  made  and  to  combine 
the  weighted  factor  scores  into  a  composite  that  represents  the  total 
contribution  or  utility  of  an  individual's  performance,  within  the  context  of 
that  decision.  That  is,  the  way  in  which  performance  information  is  weighted 
and  combined  is  a  value  judgment  on  the  organization's  part.  The 
determination  of  the  specific  combinational  rules  (e.g.,  simple  sum,  weighted 
sum,  non-linear  combination)  that  best  reflect  what  the  organization  is 
trying  to  accomplish  is  a  matter  of  research. 


A  Structural  Model 

If  performance  is  characterized  in  the  above  manner,  then  a  mo^-e  formal 
way  to  model  performance  is  to  think  in  terms  of  its  latent  structure, 
postulate  what  that  might  be  and  then  resort  to  a  confirmatory  analysis. 


141 


Unfortunately,  it  is  true  t^at  we  si'^o’y  '<row  a  lot  nors  anout  preO’Ctor 
constructs  than  we  do  about  job  perfornance  constructs.  There  are  voluries  :* 
research  on  the  former,  and  almost  none  on  the  latter.  For  personnel 
psychologists  it  is  almost  second  nature  to  talk  about  predictors  in  terms  o‘ 
theories  and  constructs.  However,  on  the  perfornance  side,  the  textbooks  are 
virtually  silent.  Only  a  few  people  have  even  raised  the  issue  (e.g., 
Dunnette,  1963;  Wallace,  1965). 


Unit  vs.  Individual  Performance 

Finally,  people  do  not  usually  work  alone.  Individuals  are  ne'"pe'‘s  o* 
work  groups  or  units  and  it  is  the  unit's  performance  that  frequently  is  t“e 
most  central  concenn  of  the  organization.  However,  oetenminiig  the 
’ndividual 's  contribution  to  the  unit's  performance  is  not  a  simple  crcple"i. 
“urther,  variation  in  unit  performance  is  most  likely  a  function  of  a  nuroe- 
of  factors  besides  the  "true"  level  of  performance  of  each  individual. 

For  two  major  reasons.  Project  A  has  not  incorporated  unit  effectiveness 
in  its  model  of  performance.  First,  the  project  is  focused  on  the 
develooment  of  a  new  selection/classification  system  for  entry  level 
personnel  and  is  concerned  with  improving  personnel  decisions  about 
individuals  and  not  units.  The  task  is  to  maximize  the  average  payoff  pe** 
indi vidual  selected. 

The  second  major  reason  is  the  prohibitive  cost.  It  simply  was  not 
possible  to  develop  reliable  and  valid  field  exercises  for  assessing  unit 
performance  in  a  representative  sample  of  jobs  within  a  reasonable  time 
frame.  In  isolated  instances  it  might  be  possible  to  taxe  advantage  of 
regularly  scheduled  exercises  or  use  existing  performance  records  tnat  a 
particular  unit  (e.g.,  maintenance  depot)  might  keep.  However,  it  proved  not 
possible  to  obtain  such  data  in  any  systematic  way.  Even  if  it  could  be 
done,  it  would  not  be  easy  to  establish  the  correspondence  between  individual 
performance  and  unit  effectiveness. 

What  we  have  chosen  to  do  is  to  try  to  identify  the  factors,  or  means, 
by  which  individuals  contribute  to  unit  performance  and  to  assess  individual 
performance  on  those  factors  via  rating  methods.  We  also  have  a  certain 
amount  of  information  on  situational  and  unit  characteristics  and  are 
attempting  to  determine  how  much  of  the  variance  in  individual  performance  is 
accounted  for  by  thos<  characteristics. 


Criterion  Development 


Actual  criterion  development  proceeded  from  two  basic  types  of 
information.  First,  all  available  task  descriptions  were  used  to  generate  a 
population  of  job  tasks  for  each  MOS.  The  principal  sources  of  task 
description  are  the  Army's  periodic  job  description  surveys,  which  use 
questionnaire  checklists  of  several  hundred  task  statements  to  survey  job 
incumbents  about  the  frequency  with  which  they  perform  each  task,  and  the 
Soldier's  Manual  for  each  job  which  is  a  complete  specification  by  management 
of  what  the  task  content  of  the  job  is  supposed  to  be.  The  two  sources 
describe  tasks  at  a  somewhat  different  level  of  generality  with  the  occupa¬ 
tional  survey  items  being  much  more  specific  in  nature. 


142 


Unfortunately,  no  textoook  or  available  tecbnolo^  tells  us  wnat  fie 
specifications  of  a  task  description  should  be  for  clfferent  purposes.  We 
ooted  for  statements  which  described  a  complete  operation,  which  had  a 
recognizable  beginning  and  end,  and  which  were  relatively  Independent  of 
other  tasks.  That  Is,  it  is  possible  to  perform  Tas<  A  without  performing 
Task  B.  After  much  editing,  revising,  and  a  formal  review  by  a  panel  of 
subject  matter  experts,  a  population  of  130-130  tasks  was  enumerated  for  each 
MCS. 


An  additional  series  of  expert  judgments  was  then  used  tc  scale  the 
relative  difficulty  and  importance  of  each  task  and  to  cluste*-  tasks  on  the 
basis  0^  content  similarity.  Sampling  tasks  for  -measurement  was  accomplished 
vi  a  a  kind  of  Delphi  procedure,  "hat  is,  each  nemce''  of  a  team  of  task 
selectors  was  asked  to  select  30  tasks  from  the  peculation  of  tasks  suen  that 
the  selected  tasks  were  representative  of  task  content,  were  important,  and 
represented  a  range  of  difficulty.  The  individual  judge's  choices  were  then 
regressed  on  the  task  characteristics  and  both  the  choices  and  the  captured 
"policy"  of  each  person  were  fed  back  to  t.ne  group  members,  who  each  revised 
their  choices  as  they  saw  fit.  Typically,  convergence  was  achieved  quickly 
and  the  final  selection  was  by  consensus.  The  consensus  of  the  task  selec¬ 
tion  panel  was  then  thoroughly  reviewed  by  the  Arn^^  command  responsible  for 
that  particular  job. 

Standardized  job  samples,  the  paper-and  pencil  job  knowledge  tests,  and 
numerical  ratings  scales  were  then  constructed  to  assess  knowledge  and  profi¬ 
ciency  on  these  tasks.  Each  measure  went  through  multiple  rounds  of  pilot 
testing  and  revision.  The  job  sample  tests  were  fairly  elaborate  and  were 
composed  of  multiple  stations  sometimes  spread  over  an  area  of  football  field 
Size.  Each  task  to  be  tested  was  broken  down  into  several  steps  each 
which  was  scored  pass/fail. 

The  second  procedure  used  to  describe  joo  content  was  the  critical 
Incident  method.  Panels  of  NCO  and  officers  generated  thousands  of  critical 
incidents  of  effective  and  ineffective  performance.  There  were  two  basic 
formats  for  the  critical  incident  workshops.  One  asked  participants  to 
generate  incidents  that  potentially  could  occur  in  any  job.  The  second  tyre 
focused  on  incidents  that  were  specific  to  the  content  of  the  particular  joo 
under  consideration.  The  behaviorally  anchored  rating  scale  procedure  was 
used  to  construct  rating  scales  for  performance  factors  specific  to  a  parti¬ 
cular  job  (MOS-specific  BARS)  and  performance  factors  that  were  defined  in 
the  same  way  and  relevant  for- all  jobs  (Army-wide  BARS). 

The  critical  incident  procedure  was  also  used  with  workshops  of  combat 
veterans  to  develop  rating  scales  of  "expected"  combat  effectiveness. 

Since  one  major  objective  was  to  determine  the  relationships  between 
training  performance  and  job  performance  and  their  differential  predict¬ 
ability,  if  any,  a  comprehensive  training  achievement  test  was  constructed 
for  each  MOS  by  carefully  matching  the  content  of  the  program  of  instruction 
(POI)  with  the  content  of  the  population  of  job  tasks,  and  writing  items  to 
represent  each  segment  of  the  match.  We  were  most  interested  in  task  content 
which  is  taught,  and  also  performed  on  the  job,  versus  tasks  which  were 
performed  on  the  job  but  not  part  of  the  POI.  Scores  on  this  latter  category 
of  items  (when  given  to  trainees)  would  be  a  measure  of  incidental  learning. 


143 


Tt:e  correlation  of  direct  learning  and  incidental  learning  wit^  jot 
perfornance,  both  when  initial  ability  is  controlled  and  when  it  is  net,  is 
of  considerable  interest. 

The  final  entry  in  the  array  of  criterion  measures  was  produced  oy  a 
concerted  effort  to  get  what  we  could  from  the  files  or  archival  records. 
Potentially  at  least,  there  are  numerous  performance  indicators  lurking  in 
e<ist'"o  computer  records  and  personnel  files.  We  began  by  enumerating  all 
possi ti iities  from  three  major  sources  of  such  recorcs. 

~~e  Enlisted  Master  File  (£VF)  -  a  central  computer  record 
selected  personnel  actions. 

"'•e  Enlisted  Military  Personnel  file  (EMPF)  -  whicn  is  the 
Ti'^-ii7TenF"MTtorrcTPrecorT"of'’an’'T7rdi vi dual* s  military  service 
kept  on  microfiche  at  a  central  location. 

^Hitary  Personnel  Records  Jacket  (MPRJ)  -  or  more  commonly 
<r.cwn  as  t.ne  2C1  file  whicn  is  the  personnel  folder  that 
follows  the  individual. 

We  systematically  compared  these  three  sources  using  a  sample  of  750 
people  and  a  standardized  information  recording  form.  The  201  file  looked 
the  npst  promising  in  terms  of  recency  and  completeness,  but  of  course  it  is 
by  far  the  most  expensive  to  search,  (The  textbooks  never  mention  these 
cost-benefit  questions.)  As  a  consequence,  everyone  crossed  their  fingers 
and  we  collected  eight  archival  performance  indicators  via  a  self-report 
questionnaire.  That  is,  people  were  asked  what  was  in  their  personnel  file 
as  regards  letters  of  commendation,  disciplinary  actions,  etc.  Field  tests 
on  a  sample  of  500  people  showed  considerable  agreement  between  self-report 
and  archival  records.  Almost  all  disagreements  were  in  the  direction  of  more 
frequent  self-reports,  for  both  positive  and  negative  things.  Further 
followup  questionnaires  and  interviews  suggested  that  self-report  may  be  the 
more  accurate.  Anyway,  we  used  them  and  their  distributions  and  correlations 
seemed  quite  reasonable.  The  self-report  items  were  combined  into  four 
indicators  that  were  actually  used  as  criterion  measures. 

The  complete  array  of  performance  measures  in  the  form  in  which  they 
survived  a  large  scale  field  study  of  N  “  150/M0S  for  nine  MOS  is  shown  in 
Table  A, 


These  are  the  measures  which  were  administered  to  the  concurrent  sample 
of  400-600  people  in  each  of  the  19  MOS,  The  distinction  between  the  Batch  A 
(9  MOS)  and  Batch  Z  (10  MOS)  is  that  not  all  criterion  measures  were 
developed  for  each  job  in  Batch  Z.  Budget  constraints  dictated  that  the 
job-specific  measures  could  only  be  developed  for  a  limited  number  of  jobs 
(i .e. ,  Batch  A) . 


144 


Tafcie  4 


Sunnary  Criterion  Measures  Jsed  in  Concurrent 
Validation  Sarioies^’ 


?er*orrarce  Measures  C.yncn  to  3at:*<  A  and  Sat:**  Z  **CS  (Jobs) 

1.  Ten  benaviorally  anc.nored  rating  scales  designed  to  measure  factors  of 
non-joo-soecific  performance  (e.g.,  giving  peer  leadership  and  supoort, 
maintaining  eduipment.  self  discipline). 

2.  Single  scale  rating  of  overall  job  performance. 

3.  Single  scale  rating  of  NCO  (nofr-conmissi oned  officer)  potential. 

4.  ?aoer-and-pencl  1  "'as*  of  Training  Achievement  developed  for  each  of  t^* 
15  'ICS  30-210  items  eacn). 

5.  A  ac-item  sumnated  rating  scale  for  the  assessment  of  expected  comcat 
oer'omance. 

5.  five  oerfomance  indicators  from  administrative  recoros.  The  first 
four  are  obtained  via  self-report  ana  the  last  one  from  computerized 
recoras. 

0  Total  numoer  of  awards  and  letters  of  commendation. 

0  Physical  fitness  oual ification. 

0  Numoer  of  disciplinary  infractions. 

0  5i'ie  marxmanship  qualification  score. 

0  Promotion  rate  (in  ceviation  units). 

Oo-*omancs  **easures  fpr  Batch  A  Only 

7.  Joo-sample  (hands-on)  test  of  MOS-specific  task  proficiency. 

0  Individual  is  tested  on  eacn  of  15  major  job  tasks. 

3.  Paoer-and-penci  1  job  knowledge  tests  designed  to  measure  task-soeci fi : 
job  knowledge, 

0  Individual  is  scored  on  150-200  multiple  choice  items  representing 
30  major  job  tasks.  Fifteen  of  the  tasks  were  also  measured 
ftan<ts~on. 

9.  bating  scale  measures  of  soecific  tas<  performance  on  the  15  tasks  also 
measured  with  the  knowledge  tests  and  the  hands-on  measures. 

10.  MOS-soecific  behaviorally  anchored  ratings  scales.  From  7  to  13  BASS 
were  developed  for  each  MOS  to  represent  the  major  factors  that 
constituted  job-specific  technical  and  task  proficiency. 

Performance  **easures  for  Batch  2  Only 

11.  Ratings  of  performance  on  13  representative  "cornon"  tasks.  The  Amy 
specifies  a  series  of  common  tasks  (e.g.,  several  first  aid  tasks)  that 
everyone  should  be  able  to  perform. 

Auxiliary  Measures  Included  in  Criterion  Battery 

12.  Job  History  Questionnaire  which  asks  for  information  about  frequency  and 
recency  of  performance  of  the  MOS-specific  tasks. 

13.  Work  Environment  Description  Questionnaire  -  a  141-item  questionnaire 
assessing  situational/environmental  characteristics,  leadersnip  climate, 
and  reward  preferences. 


1aii  rating  measures  were  obtained  from  approximately  2  supervisors  and  3 
peers  for  each  ratee. 


145 


Results  From  the  Concurrent  Validation  Sawple 


If  all  the  rating  scales  are  used  separately,  the  ‘OS-specific  measures 
are  aggregated  at  the  task  or  instructional  module  level,  and  the  major 
predictor  subscales  are  used,  there  are  approximately  2C0  criterion  scores 
and  60-70  predictor  scores  on  each  individual. 

At  this  point,  a  classic  argument  arises  between  the  emcirical  keying' 
"let's  look  for  all  the  specific  variance  we  can"  types  and  t^e  i-dividuals 
wno  want  to  reduce  collinearity  as  much  as  possible  and  deal  at  tne  construct 
level.  We  have  tried  for  more  of  the  latter  than  the  former  'or  a  numoer  of 
reasons.  One  neason  is  that  we  would  like  the  project  to  orodjcg  as  many 
general i :ab‘i e  truths  as  possible.  Another  stems  from  the  dilemma  between 
accuracy  cf  prediction  and  accuracy  of  estimation,  o’*  tne  cross  validation 
prodem.  That  is,  the  more  a  prediction  equation  maximizes  tne  accuracy  of 
prediction  in  tne  sample,  the  more  error  it  introduces  into  the  estimation  of 
the  degree  of  accuracy  in  the  population, 

^reject  A  is  faced  with  the  task  of  estimating  several  kinds  of  dif¬ 
ferential  validity.  It  is  reasonable  to  ask  at  the  outset  whether  it  is  even 
possible,  for  a  system  of  any  multivariate  complexity,  to  detect  reasonable 
amounts  of  differential  prediction  with  reasonable  amounts  of  statistical 
power.  The  fewer  parameters  one  must  estimate,  the  greater  the  chances  of 
being  able  to  do  that,  which  is  a  primary  reason  for  examining  the  latent 
structure  of  predictors  and  criteria  as  carefully  as  possible. 

Since  we  can  draw  a  fairly  reasonable  picture  of  the  population 
variance  matrices  for  both  predictors  and  criteria  and  thus  provide  a  better 
starting  point  for  Monte  Carlo  studies,  one  major  research  question  we  hope 
to  answer  is  whether  it  is  ever  possible  to  estimate  the  parameters  necessary 
for  building  a  true  classification  algorithm.  If  it  can't  be  done  with  a 
sample  of  20  jobs  and  500  cases  per  job,  then  perhaps  the  textboox  discus¬ 
sions  of  the  classification  problem  are  a  bit  academic. 


The  Road  to  Constructs 


For  both  predictors  and  criteria,  the  procedure  for  getting  from  the 
individual  task  or  scale  scores  to  factor  or  construct  scores  was  similar; 
except  for  the  degree  to  which  the  previous  literature  was  of  help.  Many 
decades  of  research  on  the  measurement  of  abilities,  personality,  and 
interests  have  provided  a  lot  of  information  about  the  structure  of 
individual  differences.  Similar  help  from  the  performance  side  is  really  not 
available  except  for  a  modest  number  of  descriptive  studies  of  specific 
occupations  such  as  managers,  nurses,  police  officers,  fire  fighters,  and  the 
elusive  and  seldom  seen  college  professor.  Unfortunately,  we  were  operating 
in  a  different  job  population  and  knew  only  that  paper-and-penci 1  measures 
and  rating  measures  would  produce  a  lot  of  so-called  method  variance. 

Given  this  initial  disparity,  we  used  both  expert  judgment  and  factor 
analytic  results  from  the  field  tests  to  formulate  a  target  model.  A  picture 
of  that  model  is  shown  in  Figure  3. 


146 


JOB  PERFORMANCE  — 

A  PROPOSED  STRUCTURAL  MODEL 


This  picture  is  included  only  to  show  one  stage  in  the  alriost  continuous 
process  of  boctstrapoing  ourselves  toward  a  more  final  conceptual  description 
of  the  predictor/criterion  space. 

The  target  model  was  then  subjected  to  what  might  be  be  described  as  a 
“quasi"  confirmatory  analysis  using  the  concurrent  validation  sample.  For 
the  predictor  scales  that  meant  using  the  target  to  specify  the  nunoer  of 
factors  for  a  full  sample  solution  (i.e.,  all  MOS  combined).  The  predictor 
constructs  and  their  associated  component  scales  are  shown  in  Table  5. 


"or  the  within  mCS  criterion  matrixes  we  used  confirmatory  analyses  and 
stte-oted  to  test  alte-native  models,  "he  alternative  models  were  octai-ed 
oy  allowing  the  principal  investigators  to  first  look  at  the  data,  in  t''e' 
'*on  of  a  series  of  principal  component  analyses,  and  to  formulate  a  target 
matrix  for  a  LISR£L  solution.  Some  clear  alternative  ideas  emerged  and  these 
were  comoareo  in  each  mQS.  After  not  too  much  cutting  and  fitting,  we 
arrived  at  a  single  portrayal  of  the  latent  structure  of  performance  that 
both  fit  the  data  in  each  job  and  seemed  to  make  good  sense.  Obviously,  the 
confirwiatory  analysis  was  not  used  in  a  strictly  confirmatory  way.  This 
structure  of  job  performance  is  portrayed  in  Table  6. 


The  model  best  confirmed  by  LIS3EL  specified  five  “substantive"  and 
"ratings"  and  "written  test"  method  factors,  that  were  orthogonal  to  the 
substan^iti ve  factors  and  to  each  other.  The  first  two  substantive  factors 
are  based  on  the  knowledge  tests  and  the  job  sample  measures.  We  have  called 
these  the  core  technical  performance  factor  and  the  general  (not  so  core) 
task  performance  factor.  The  technical  factor  reflects  content  which  is 
central  and  largely  specific  to  the  MOS.  The  second  factor  encompasses  con¬ 
tent  that  tends  to  be  comon  across  several  jobs  and  is  less  central  to  the 
core  performance  objectives.  For  this  job  population  a  significant  part  of 
the  factor  is  represented  in  the  common  tasks,  such  as  first  aid,  basic 
navigation,  use  of  communication  equipment,  etc.  However,  it  should  be 
possible  to  make  this  distinction  for  virtually  any  job. 

The  remaining  factors  are  based  on  the  ratings,  primarily  those 
developed  by  the  critical  incident  method,  and  the  administrative/personnel 
records  that  were  collected  via  self-report.  Factor  three  encompassed  the 
most  scales  and  was  the  clearest  in  terms  of  its  loading  but  the  most 

heterogeneous  appearing  in  terms  of  content.  It  appears  to  be  a  general 

effort  and  performance,  performance  under  adverse  conditions,  peer  leadership 
factor.  In  a  spirit  of  wishful  thinking,  we  had  originally  hoped  to  separate 

some  of  these  elements,  but  either  the  lack  of  a  distinct  latent  structure  or 

the  fallibility  of  the  measures  prevented  it.  Factor  four  is  much  more 
homogenous  and  reflects  the  rating  scales  having  to  do  with  personal 
discipline  and  avoidance  of  trouble  and  the  number  of  negative  personnel 
outcomes  people  reported.  Factor  five  is  fairly  narrow  in  content  and  shows 
very  clear  loadings  for  ratings  of  military  bearing  and  the  physical  fitness 
score  that  is  part  of  everyone's  personnel  record. 


148 


'a3ii  5 


i:'*'*/.  .  41a  ‘ntf'e*:  ^ict:'‘s  ice»tt***a  vii  arji/sis  ;f  *^e 

»»';:r:y  54;^  o"  9*2C  jco  i‘'cjmce*«ts.  "*i«  t4$:$  iiveitary 
*r8«  tft*  tr?j:  dattc'y  kHic**  -ere  j$ej  ta  ‘a*^  iisio-t  saw  fjctar 
scares  ire  listed  jnd«r  tacJi  fjcrcr  title. 


-n’s 


p»c?«  ’>c>i.::3?ir-;v£ 


iCifl’  -ictar 

■isie’rc''“5  rejects  test 
••aa  tes: 

••%•*  "it* 

•’t*!"*** 


PS/C'Crctt' 
Zi  nr, ;  n 

'acta- 
:-:ot  test 

("ime  score) 

N'tat 

Sncct  test 

'"ime  ta  'ire' 

'a';e: 

jncet  test 

', ‘.ag  distance' 

Nr-et 

■riC<‘ng 

C-ag  d- stance' 

Nr-et 

■'ICt-ng  1 

'.ag  distance) 

*53^*:  *'ei'.  “aveTent  "ine 

5s-t»:VJ4‘  ;:««t  -icttr 

jeort  N't  NTO'y  test  (decision  time 
N'ts:t-a'  Stees  1  sccjrac/  test  ;2ec 
Nr^et  rte'iti fication  test  (Seeision 

N'tettja’  icteric/  NetJr 

snort  N't  Nmcry  test  JN'Ce^t  car'e 
N'tsttear  Soeed  i  icturicy  test  {^er 
Nrce:  rtent* f'catiofl  test  jNreent  c 

•:,mce'  Steet.'icfjricy  ^ictar 

•iynite'  '"emery  test  (Percent  correct) 

•Juflce'  “emo'y  test  (Initial  decision  time) 
'lumte'  “emery  test  (“ein  ooeritians  time) 
•iumce'  “etory  test  (Pinal  decision  time) 

jimc'e  Peitticn  Sceed  Pictar 

Snolta  deletion  Time  test  (Decision  time) 
iimoN  Peaction  Time  test  (Cecision  time) 

S'-o'e  Peattion  Accuracy  factor 

tnoice  Peacticn  Time  test  (Percent  correct) 
i"T.ole  Peaction  Time  test  (Percent  correct) 


icnie»eme"t  factar 

Self-esteem  scale 
iiortt  Orientation  scale 
ine'Dy  l.e»et  scale 

Dete'daa"*  "ty  "icttr 

:an$cienticus“ess  sci'e 
■Son-ael 'ncuency  sca'  e 

Adjustment  Pacter 

Smotional  Staoility  scale 

Pnysical  Candition  Pactar 

Physical  Condition  scale 

Skilled  Teennical  Interest  Pacto' 
Clerical /Administrative 
**edical  Services 
Lfadersnis/Sui dance 
Science/ Chemical 
Data  Processing 
•♦at. nematics 

ilectronic  Communications 

Sion  time' 

ime)  Structural, '"iacnines  Interest  Nctor 

."iecnanics 

Heavy  Canstruction 
t)  Tectronics 

ent  carrect'  Venicle/Ecuisment  Coe'itsr 

rrect) 

Cenaat  Pelated  Ints'est  "actar 
Comoa  t 

Pugged  Individual  ism 
Pirearms  Entnusiast 

Audiovisual  Arts  Interest  factor 
Dra'ting 
Audiograpnics 
Aestnetics 

Pood  Service  Interest  'actor 
Pood  Service  Professional 
rood  Service  Employee 

Protective  Services  Interest  factor 
Law  Enforcement 
fire  Protection 


Preference  for  Organizational  S 
Ca-woriter  SuCOOrt 
Status 

Serving  Others 
Organizational  Support 
Amoitlpn 

Pre*erence  far  Poufne  Wo'it 
Soutine 

p'eference  for  Jcb  Autonomy 
Autonomy 


149 


TaoTe  6 


Performance  factors  reoresenting  tne  ccmon  latent  structure 
across  all  jobs  in  the  Project  A  sample.  The  criterion 
measures  that  comprise  each  factor  are  as  inoicathd. 


1)  Task  Proficiency:  Specific  core  technical  skills:  The  proficiency  with 
which  the  individual  performs  the  tasks  which  are  ’’central"  to  his  or  her  joo 
(MOS).  The  tasks  represent  the  core  of  the  job  and  they  are  its  primary 
cefiners  from  job  to  job. 

0  The  subscales  representing  core  content  in  both  the  knowledge 
tests  and  the  job  sample  tests  that  loaded  on  f’is  factor  were 
summed,  standardized,  and  then  added  together  for  a  total  factor 
score.  The  factor  score  does  not  include  any  rating  measures. 

21  Task  Proficiency:  General  or  common  skills:  I*'  addition  to  the  core 
tac'nrica:  corte^'t  specific  to  an  MGS,  inoi/iouals  in  every  *iOS  are 
’■esDC'Si ole  for  being  able  to  perform  a  variety  of  general  or  common  tasxs  — 
s.g.i^use  of  basic  weapons,  first  aid,  etc.  This  factor  represents  profi¬ 
ciency  on  tVese  general  tasks. 

0  The  same  procedure  (as  for  factor  one)  was  used  to  compute  the 
knowledge  and  hands-on  general  task  scores,  standardized  within 
methods,  and  add  the  two  standardized  scores. 

3)  Peer  Leadership,  Effort,  and  Self  Development;  Reflects  the  degree  to 
which  the  individual  exerts  effort  over  the  full  range  of  job  tasks, 
perseveres  under  adverse  or  dangerous  conditions,  and  demonstrates  leadership 
and  support  toward  peers.  That  is,  can  the  individual  be  counted  on  to  camy 
out  assigned  tasks,  even  under  adverse  conditions,  to  exercise  good  judgment, 
and  to  be  generally  dependable  and  proficient? 

0  Five  scales  from  the  Army-wide  BARS  rating  form  (Technical  Knowledge/ 
Skill,  Leadership,  Effort,  Self -development,  and  '‘aintaining  Assigned 
Equipment),  the  expected  combat  performance  scales,  the  job-specific 
BARS  scales,  the  general  performance  rating,  and  the  total  number  of 
commendations  and  awards  received  by  the  individual  were  summed  for 
this  factor. 

A)  Haintaininq  Personal  Discipline:  Reflects  the  degree  to  which  the 
individual  adheres  to  Army  regulations  and  traditions,  exercises  personal 
self-control,  demonstrates  responsibility  in  day-to-day  behavior,  and  does 
not  create  disciplinary  problems. 

0  Scores  on  this  factor  are  composed  of  three  Army-wide  BARS  scales 
(Following  regulations,  Self-Control,  and  Integrity)  and  two  indices 
from  the  administrative  records  (number  of  disciplinary  actions  and 
promotion  rate). 

5)  Physical  Fitness  and  Hilitary  Bearinq:  Represents  the  degree  to  which 
the  individual  maintains  an  appropriate  military  appearance  and  bearing  and 
stays  in  good  physical  condition. 

0  Factor  scores  are  the  sum  of  the  physical  fitness  qualification  score 
from  the  individual's  personnel  record  and  two  rating  scales  from  the 
Army-wide  BARS  (Military  Appearance  and  Physical  Fitness). 


150 


In  general,  this  solution  fits  the  data  from  all  HOS,  seens  reasonable 
and  appropriate  to  Army  management,  and  is  not  too  far  from  our  hypothesized 
structure,  although  we  hoped  to  split  factors  two  and  three  into  a  'ew  more 
pieces. 

Given  these  two  pictures  of  the  predictor  domain  and  the  performance 
space,  we  have  begun  exploring  questions  of  differential  validity  across 
criterion  components,  differential  validity  across  jobs,  differential 
validity  across  subgroups  for  people,  and  overall  classification  efficiency 
under  a  variety  of  constraints. 


Criterion  Intercorrelations 


As  described  in  Wise,  Campbell,  Hanser,  and  McHenry  (1936)  five  res'Cjal 
scores  we^e  created  from  the  five  criterion  factors  in  the  following  manner. 

A  paoer-and-penci 1  "methods"  factor  score  was  created  by  first  summing  the 
two  paoer-and-penci 1  knowledge  tests  (job  knowledge  and  training  content 
knowledge  scores)  and  then  partialing  out  the  variance  due  to  the  correlation 
of  the  total  paoer-and-penci  1  test  score  with  all  non-paper-and-per.ci  1 
criterion  measures  (e.g.,  hands-on  scores,  rating  scores,  and  administrative 
records  scores).  This  residual  was  defined  as  the  paper-and-penci 1  method 
score.  This  variable  was  in  turn  partialed  from  the  Core  Technical 
Proficiency  criterion  factor  and  from  the  General  Task  Proficiency  factor 
creating  two  residual  scores.  A  similar  procedure  was  used  to  create  a 
rating  method  factor  score  which  was  in  turn  partialed  from  the 
Effort/Leadership,  Personal  Discipline,  and  Physical  Fitness/Mi litary  Searing 
factors,  thereby  creating  three  more  residual  scores. 

The  five  criterion  factor  scores,  the  five  residual  criterion  scores, 
tne  single  rating  obtained  from  the  overall  performance  rating  scales,  and 
the  total  score  from  the  hands-on  test  were  used  to  generate  a  12  x  12  matrix 
of  criterion  intercorrelation  for  each  MOS  in  Batch  A.  .The  averages  of  these 
correlations  across  MOS  are  shown  in  Table  7, 


Remember  that  to  create  the  residual  scores  the  paper-and-penci 1  factor 
was  partialed  from  the  first  two  criterion  factors  and  the  rating  method 
factor  was  partialed  from  the  last  three  criterion  factors.  The 
intercorrelations  of  the  5  criterion  factors  are  in  the  upper  left  quadrant, 
the  intercorrelations  among  the  5  residual  scores  are  in  the  lower  right 
quadrant,  and  the  cross  correlations  are  in  the  upper  right  and  lower  left. 
Also  remember  that  the  first  two  factors  contain  items  from  both  the 
knowledge  tests  and  hands-on  tests  and  the  last  three  factors  all  contain 
both  ratings  and  administrative  measures. 

Some  noteworthy  features  of  this  12  x  12  matrix  are  the  following: 

•  The  intercorrelations  of  the  factor  pairs  which  confound  measure¬ 
ment  method  (e.g.,  1  with  2  or  3  with  4)  are  higher,  as  expected, 
than  factor  pairs  which  do  not  confound  method  (e.g.,  1  with  3  or 
2  with  4).  However,  they  are  not  so  high  that  collapsing  the 


Criterion  Sunary  Score  NSRAUCfP  NUAUGSP  NMAUtLS  NSIAUHm  NlRAUTil  NSXPCII  NJXHIOIt  :K  HSAtSii&f  NMCSCLt  NMfSIVO  NMiSPfl 


152 


TtbU  • 


Kultipt*  CerrfUtlons^  b«tv«tn 
Crittrton  Scsrts 

«nd  fr«dietor  C«apositt  Scares  Carlvtc  froa  Eieft  Prtdietar  S«c  Alena 


?reaictor  Coaaestta 


Cemoosite 

Derived 

from 

ASVAS 

Criterion  Factors 

Scare  (k  >  4}< 

Soatial 
Ability 
Factor 
(k  ■  1) 

Composite 

Derived 

from 

Perteotua)/ 
Pf  cnoBotor 
Ccaouter 
Factors 
(k  -  6) 

Composite 

Derived 

from 

Biodata/ 
Temper  aMnt 

(ABLE) 
Factors 
(k  -  4) 

Composite 
Derived 
from 
Interest 
(AV3ICE) 
Factors 
(k  •  6) 

Camoostte 
Derived 
from  Joe 
Pewara 
Preference 
(JCB) 
Factors 
(k  -  3) 

Hands-On 

Total 

Scare 

.49 

.46 

.42 

.20 

.27 

.22 

Care 

Technical 
Proficiency 
(raw  scare) 

.63 

.57 

.53 

.25 

.35 

.27 

Care 

Technical 
Proficiency- 
Ires  icore)^ 

.48 

.40 

.38 

.21 

.28 

0 

.21 

General 
Soldiering 
Proficiency 
(ram  score) 

.66 

.64 

.58 

.25 

.35 

.29 

General 
Soldiering 
Proficiency 
(resid  scare) 

.61 

-.50 

.43 

.21 

.28 

..22 

Effort/ 
Leadership 
(raw  score) 

.33 

.27 

.27 

.33 

.25 

.19 

Effort/ 
Leadership 
(resid  score) 

.46 

.42 

.38 

.31 

.32 

.25 

Personal 
Oiscipl ine 
(raw  score  1^ 

.20 

.16 

.15 

.32 

.15 

.11 

Personal 
Discipline 
(resid  score) 

.21 

.18 

.16 

.28 

.17 

.20 

Fitness/ 
Bearing 
(raw  score) 

.20 

.11 

.11 

.36 

.12 

.11 

Fitness/ 
Bearing 
(resid  score) 

.21 

.12 

.13 

.34 

.13 

.10 

^HuUlol*  B*  adjusted  far  shrinkage  and  corrected  for  range  restriction,  but  are 
.not  corrected  for  criterion  unreliability. 

*fc  «  the  number  of  preo.etor  factor  scores  used  in  computing  the  composite. 

^Residual  scores  were  formed  by  partial ing  a  paper>and‘pencil  'method'  construct 
from  Core  Technical  and  General  Soldiering  Proficiency  and  by  partial ing  a  rating 
'method*  construct  from  Cffort/Leaoershlp,  Personal  Discipline  and  f itness/Seanng. 


The  entries  in  the  table  represent  the  average  across  all  MOS.  The 
level  of  validity  of  ASVAB  for  the  first  tw  .  factors  is  about  the  sa^ie  as,  or 
higher  than,  that  usuaUy  observed  when  ASVA3  is  correlated  witn  training 
criteria.  ASVAB  does  predict  job  performance.  For  the  third  factor  the 
validity  of  the  cognitive  tests  drops,  but  is  still  substantial,  and  the 
validity  of  the  non-cogniti ve  inventories  Increases.  This  reversal  becomes 
even  more  distinct  for  factors  four  and  five.  Notice  that  the  interest 
scales  are  also  a  reasonably  good  predictor  of  task  performance  and  do  not 
predict  factors  three,  four,  and  five  as  well  as  the  temperament  scales.  The 
mixed  nature  of  factor  three  is  interesting  and  along  with  the  confounding  of 
metnod  variance  between  the  first  two  and  the  last  three  factors,  it  invites 
a  consi oerafion  of  residual  scores. 


For  us  at  least,  one  of  the  most  interesting  aspects  of  the  taole  :s  2 
comparison  of  the  factor  three  raw  score  witn  the  residualized  factor  tnree. 
As  compared  to  the  correlations  with  the  raw  score  the  correlations  of  tne 
cognitive  measures  with  the  residual  go  up  substantially  and  the  correlations 
with  the  temperament  composite  go  down  slightly.  The  correlation  of  the 
interest  composite  with  factor  three  also  goes  up  when  the  rating  method 
factor  is  parti aled  out.  In  general,  interest  in  task  content  is  more 
closely  associated  with  task  performance  than  with  the  more  volitional  nature 
of  factors  three,  four,  and  five.  These  differences  are  not  nearly  so  oro- 
nounced  for  the  other  two  factors  that  involve  ratings.  We  think  this  is 
because  factor  three  Includes  the  scales  that  in  fact  asked  raters  to  assess 
the  tecnnical  performance  of  the  ratee.  It  is  tempting  to  infer  that  raters 
are  in  fact  influenced  by  the  actual  task  competence  of  raters  but  that  they 
also  reflect  differences  in  what  might  be  termed  dispositional  or  volitional 
behaviors  of  the  kind  predicted  by  personality/interest  measures.  Does  t^e 
individual  work  hard,  help  others  when  they  need  it,  keep  going  under  adverse 
conditions,  etc.?  In  our  framework,  these  are  both  important  components  of 
performance  and  they  are  predicted  by  different  things,  but  assessment  via 
ratings  cannot  separate  them  very  well.  Perhaps  it  is  also  understandable 
why  raters  would  have  a  difficult  time  separating  them.  It  would  require 
almost  a  mental  partial  correlation  to  do  so. 


Incremental  Validities 


The  incremental  validities  for  the  new  cognitive  tests  and  new  noncogni- 
tive  tests  over  and  above  ASVAB  alone  for  each  of  the  performance  factors  can 
be  obtained  from  the  results  presented  in  Table  9.  While  these  comparisons 
are  still  at  a  rather  general  level  and  more  analyses  need  to  be  done,  one 

reasonable  conclusion  is  that  the  new  battery  will  provide  the  largest 

increments  for  the  prediction  of  the  "will  do"  aspects  of  performance. 

Also,  we  have  not  yet  begun  to  consider  what  mi x  of  ASVAB  subtests  and  new 

cognitive  tests  might  prove  optimal  for  both  selection  and  classification. 


155 


TabU-  9 


Kultiole  Correlations^  between 
Criterion  Scores 

and  Predicto*  Comooslte  Scores  Derived  froir.  Ucn  Predicter  Set  Plus  tne  ASVAS 


Predictor  Compos Ue 


Comocsite 

Derived 

rrom 

ASVAS 

Criterion  factors 
Ccnstruct  Alone 

Composite 

Derived 

from 

ASVAS 

and 

Spatial 

Ability 

Factor 

Composite 

Derived 

from 

ASVAS  and 
Percectual/ 
Psycnomotor 
Computer 
factors 

Composite 

Derived 

from 

ASVAS  and 
Biodata/ 
lemoerament 
(ABLE) 
factors 

CoKscsite 

Derived 

from 

ASVAS  and 
Interest 
(AVOICE) 
factors 

Comocsite 

Derived 

from 

ASVAS  and 
Joo  Reward 
Preference 
(JOS) 
(actors 

Conoosue 

Oe-ivec 

from 

ASVAS  an; 
All  Injl 
Bitterjr 

(TE) 

factor: 

Hjr.c$-Cn 

Td-.il 

Score 

.49 

.51 

.51 

.49 

•  SO 

.49 

.53 

Cc'? 

*ecnr ical 
ictercy 
‘raw  score) 

.s3 

.65 

.64 

.64 

.55 

.64 

.67 

Core 

Tecnnical 
Prcficiency. 
(res  score 

.48 

.49 

.49 

.49 

.50 

.48 

.52 

General 
Sciciering 
Prsficiency 
(raw  score) 

.66 

.69 

.68 

.67 

.67 

.67 

.71 

General 
Soldiering 
Proficiency 
(res id  score) 

.51 

.53 

.52 

.51 

.52 

.51 

.55 

Effort/ 
Leadership 
(raw  score) 

.33 

.34 

.34 

.43 

.37 

.35 

.45 

Effort/ 
Leadership 
(resid  score) 

.46 

.47 

.47 

.51 

.49 

.47 

.53 

Personal 
Discipl ine 
(raw  score) 

.20 

.20 

.20 

.37 

.23 

.23 

.38 

Personal 
Discipl ine 
(resid  score) 

.21 

.21 

.22 

.35 

.24 

.23 

.36 

Fitness/ 
Gearing 
(raw  score) 

.20 

.21 

.21 

.41 

.24 

.22 

.42 

Fitness/ 
Bearing 
(resid  score) 

.21 

.23 

.24 

.40 

.25 

.23 

.41 

^HuUlple  B*  adjusted  for  shrinkage  and  corrected  for  range  restriction,  but  are  not' 
.corrected  for  criterion  unreliability. 

^Residual  scores  were  formed  by  partial ing  a  paper-and*penci1  'method*  construct  from  Core 
Technical  and  General  Soldiering  Proficiency  and  by  partial ing  a  rating  ‘method*  construct  from 
Effort/Leaoership,  Personal  Discipline  and  fitness/Bearing. 


156 


TmI«  10 


tMulta  of  Stoowiso  aegressions  ^itnin  Caen  *r«eieter  Oemam 
for  tno  focr  Arsywloo  ^orforaanea  Conatructs 
across  All  9  Mten  A  ncs 


Critarlon  Canatruct 

Pradietor  Constrtxts 

Canaral 
SoldUring 
(raw  scora) 

Effort  and 
Laadorship 
(mid  scora) 

Effort  and 
LaadarsAip 
(raw  scora) 

aarsonal 
Olseiotira 
(raw  seer a) 

Pfiys  fitnass/ 
MU  SaarinQ 
(raw  tcort) 

ASVAB  fACTCRS 

Varda 1 

0.10 

0.03 

-0.07 

•0.03 

•0.11 

auantativo 

0.20 

0.08 

0.03 

0.37 

0.33 

''ae.nn  icai 

0.26 

0.21 

0.21 

0.36 

-0.35 

Spaod 

0.03 

0.07 

0.09 

O.OA 

o.ro 

AOJ,  UNCSKR  a 

0.A41 

0.280 

0.206 

0.1C6 

0.161 

SPATIAL 

Ovarall  Spatial 

0.A7 

0.2S 

O.U 

0.07 

•o.os 

UNCCPRECTES  A 

0.A66 

0.2S3 

0.1A2 

0.068 

0.0A7 

COMPUTER 

Conelax  Parc  Ssaad 

•0.09 

'0.06 

•0.07 

a 

Ccfflplex  Pare  Aecy 

0.19 

0.07 

0.09 

O.OS 

Nunoar  Soaad/Accy 

•O.U 

•0.C6 

•0.09 

•0.03 

a 

Psyenemotor 

•0.19 

•0.08 

•0.10 

a 

Simp  Raactien  Aecy 

0.04 

a 

a 

•0.06 

Simp  Raaetion  Spaad 

a 

•0.07 

AOJ,  UNCCRR  R 

0.363 

0.149 

0.208 

0.032 

0.071 

TEMPERAMENT 

Adjustciant 

0.C9 

0.04 

0.03 

0.03 

Oapandadi 1 i  ty 

0.04 

. 

0.06 

0.30 

0.12 

Surgancy 

0.04 

0.23 

0.25 

. 

0.12 

Phys  Condition 

•0.06 

• 

• 

•0.06 

0.24 

ADJ,  UNCCRR  R 

0.129 

0.255 

0.303 

0.303 

0.356 

INTERESTS 

Combat 

0.24 

0.20 

0.17 

0.04 

Maetiinas 

. 

. 

. 

•0.04 

-0.06 

Audiovisual 

. 

. 

•0.04 

. 

. 

Technical 

. 

0.06 

0.08 

0.09 

0.14 

Food  Service 

•0.10 

•0.16 

•0.12 

-0.06 

-0.05 

Protect iva  Sve 

•0.06 

• 

• 

•0.09 

• 

AOJ,  UNCCRR  R 

0.229 

0.235 

0.199 

0.078 

0.119 

JC8  VALUES 

Secoert 

0.03 

0.05 

0.05 

0.10 

Autenwnry' 

0.05 

0.07 

0.03 

•0.C6 

-0.05 

Routine 

•o.n 

•0.12 

•0.09 

•0.03 

•0.02 

AOJ,  UNCCRR  R 

0.123 

0.150 

0.112 

0.063 

0.C97 

158 


Tsai*  11 


Results  of  Stcowis*  R*«r*ssicns  eitnin  Escn  Predictor  Sensin 
for  MCS-So*cifie  Cor*  Tecnnrcsl  Proficiency 
for  Esen  of  t»*  9  B«tcn  A  mcs 


NOS 

Predictor  Constructs 

111 

138 

19E 

31C 

638 

6AC 

711. 

91A 

958 

ASVA8  PAC7CRS 

Veras ( 

0.20 

0.13 

0.19 

0.16 

3.25 

0.11 

Qusntativ* 

O.U 

0.09  0.15 

0.14 

• 

0.14 

0.38 

0.12 

0.16 

Technical 

0.23 

0.23  0.27 

0.23 

o.ss 

0.34 

•0.‘1 

0.19 

0.11 

Soecd 

0.10 

• 

0.11 

• 

• 

3.08 

0.17 

0.09 

AOJ,  UNCCRR  R 

0.S03 

0.2S4  0.452 

0.427 

0.538 

0.413 

3.441 

3.456 

0.2S2 

SPATIAL 

Svsrall  Spatial 

0.48 

0.33 

0.43 

0.32 

0.41 

0.37 

0.41 

0.38 

3.23 

UNC3RRSCTE9  R 

0.475 

0.334 

0.432 

0.315 

0.412 

0.366 

0.411 

0.380 

0.275 

CCHPUTE.R 

Csna  Pare  Speed 

•0.2S 

•0.10 

• 

•0.08 

•0.14 

a 

a 

Cofflp  Pare  Acey 

0.29 

0.11 

0.16 

0.13 

. 

0.10 

3.27 

0.C9 

0.13 

Nunoer  Saeed/Acey 

•0.11 

•0.11 

•0.20 

•0.25 

•0.08 

•0.07 

•0.22 

•0.20 

•0.19 

Psvehomotor 

•0.13 

•0.17 

•0.11 

•0.09 

•0.20 

•0.10 

•0.‘5 

•0.09 

Sima  Reaction  Aecy 

* 

• 

0.12 

s 

0.08 

0.07 

0.08 

a 

Sima  Reaction  Speed 

« 

* 

a 

• 

a 

• 

* 

AOJ,  UNCCRR  R 

0.406 

0.257 

0.343 

0.253 

0.242 

0.269 

0.326 

0.261 

0.223 

TSPPSRANENT 

Adjustinant 

0.12 

0.14 

0.10 

0.10 

0.08 

Cecendabi  1  ity 

.  . 

0.08 

0.10 

0.10 

0.19 

0.12 

Surgancy 

0.19 

a  a 

0.09 

0.14 

• 

. 

Phys  Condition 

• 

•0.13 

•0.12 

•0.10 

•0.15 

• 

AOJ,  UNCCRR  R 

0.143  0.000 

0.129 

0.000 

0.119 

0.000  0.176 

0.211 

0.114 

INTERESTS 

Ceneat 

0.25 

0.25 

0.26 

0.11 

0.09 

0.12  0.18 

Machines 

. 

0.10 

0.13 

0.38 

0.09 

•0.23 

Audiovisual 

. 

• 

•0.11 

• 

•0.08 

Technical 

0.08 

0.10 

. 

0.19 

Food  Service 

•0.22 

•0.16 

•0.11 

•0.10 

•0.12 

•0.07 

•0.C6 

Protective  Sve 

•0.11 

•0.10 

•0.14 

• 

• 

AOJ,  UNCORR  R 

0.276 

0.255 

0.218 

0.000 

0.441 

0.135 

0.160  0.039 

0.000 

JC8  VALUES 

Suaport 

. 

0.14 

Automemy 

0.08 

0.17 

•  a 

0.14 

0.11 

. 

• 

. 

Routine 

•0.15 

•0.14 

•0.21 

•0.10 

•0.07 

•0.12 

• 

•0.C8 

AOJ,  UNCORR  R 

0.141 

0.201 

0.166  0.000 

0.133 

0.080 

0.038 

0.058 

0.000 

159 


Tael*  12 


Rttults  of  Sttewtso  ta^rossiens 
for  tJio  four  Arov-aiOo  OarferMnet  Cs«*atrvjets 
oeroot  All  9  tocen  A  nos 


Critorlon  Canatruet 

P*adietor  Constructs 

Conors 1 
Soldioring 
(raw  seera) 

effort  and  Effort  and 
Laaeorsnto  LaadarsAip 
(rastd  score)  (rae  seers) 

OoMonal 

Olsciplins 
(raw  scora) 

Phys  fitness/ 
Mil  ■caring 

(raw  score) 

ASVA8  FACTORS 

Vartai 

a. 09 

0.33 

•0.06 

• 

-0.10 

Cuantitativs 

0.39 

0.04 

• 

0.05 

• 

Tacnnicsl 

0.12 

0.11 

0.15 

o.or 

•0.03 

Speed 

• 

0.04 

0.06 

0.03 

0.08 

SPATIAL 

Overall  Spatial 

0.25 

0.13 

e 

• 

• 

CONP’JTER 

Conpiex  Pare  Speed 

. 

e 

•o.os 

. 

a 

Cano tax  Pare  Aecy 

0.08 

e 

0.04 

a 

Muner  ^peed/Aeey 

•0.02 

e 

e 

0.03 

Psyenomoter 

•O.OA 

e 

•0.02 

0 

Sine  Reaction  Aecy 

e 

• 

e 

0 

•0.04 

Sine  Reaction  Speed 

•0.03 

e 

a 

a 

•0.05 

TtMPSSA.'vENT 

Adjust^nt 

• 

e 

Oependabi 1 i  ty 

0.11 

0.06 

0.11 

0.30 

0.09 

Surgeney 

•0.04 

0.15 

0.20 

0.03 

0.14 

Phys  Condition 

• 

0.03 

• 

•0.05 

0.22 

INTERESTS 

Combat 

0.13 

0.11 

0.10 

0.04 

vacnines 

e 

• 

• 

•0.05 

Audiovisual 

e 

•0.02 

•0.04 

•0.03 

0.04 

Technical 

e 

• 

. 

• 

Food  Service 

•0.04 

•0.08 

•0.06 

•0.04 

• 

Protective  Sve 

0.03 

• 

•0.03 

-0.05 

JCB  VALLES 

Suooort 

Autsmomy 

• 

. 

-0.05 

•0.04 

Routine 

•0.03 

•0.04 

•0.03 

• 

• 

ADJUSTED,  UMCCRRECTED  R 

0.540 

0.392 

0.366 

o.3ir 

0.385 

161 


TaB(*  13 


atsults  of  StapMtM  St^rcssiant 
for  WS-Sptcific  Car*  T«chn<cal  P^ofiettney 
far  Cacti  of  ttia  9  Catcn  A  NCS 


MOS 

Praaletar  Canttruett 

11C 

131 

i9e 

3tC 

A38 

6AC 

til 

91A 

9«a 

ASVA8  ^ACTCfiS 


Versa l 

0.17 

• 

0.10 

0.21 

• 

• 

3.38 

3.26 

0.13 

Suartitativa 

0.39 

• 

• 

0.30 

• 

• 

0.27 

• 

Tccnnical 

0.10 

* 

0.16 

e 

0.35 

0.30 

•0.13 

0.12 

Spaed 

e 

• 

• 

e 

• 

•0.07 

• 

0.13 

• 

SPATIAL 

Overall  Spatial 

0.20 

0.2S 

0.19 

• 

o.u 

0.16 

0.2S 

0.23 

0.22 

C3NPUTER 

Canolex  Pare  Speed 

•0.18 

• 

e 

e 

e 

•0.12 

e 

Complex  Pare  Aeey 

0.13 

e 

0.S9 

.0.10 

e 

O.U 

0.15 

0.09 

Nuieer  Speed/Aeey 

e 

e 

•0.09 

e 

• 

e 

e 

•0.11 

Psyehamtor 

. 

e 

. 

e 

a 

• 

e 

Simp  React Ian  Aeey 

e 

0.07 

e 

e 

e 

e 

Simp  Reaetlan  Speed 

• 

•0.10 

» 

•0.11 

a 

a 

• 

TEMPERAMCN7 

Adjustment 

•o.os 

e 

e 

•0.09 

• 

e 

e 

a 

• 

Ocpendaoility 

0.12 

e 

0.10 

0.15 

0.13 

0.07 

0.11 

0.22 

0.12 

Surgency 

e 

« 

. 

• 

• 

• 

• 

• 

• 

Phys  Condition 

e 

• 

•0.09 

• 

•0.06 

• 

e 

•0.13 

• 

INTERESTS 

Combat 

0.15 

0.21 

0.17 

• 

• 

0.16 

Machines 

0.21 

0.32 

•0.16 

Audiovisual 

•O.U 

• 

•0.09 

•0.13 

Technical 

0.12 

food  Service 

•0.07 

• 

• 

Protective  Sve 

•0.08 

-0.08 

• 

• 

JCB  PREFERENCES 

Support 

• 

• 

0.09 

e 

0.12 

0.09 

Autememy 

0.09 

• 

•0.11 

• 

a 

. 

• 

Routine 

•0.06 

•0.11 

• 

• 

a 

0.07 

• 

ADJUSTED ,  UNCCRRECTED  R 

O.S60 

0.305 

0.46A 

0.352 

0.591 

0.401 

0.481 

0.507 

0.294 

162 


Correlations  betueen  the  Predictor  Constructe 
end  tlie  AieyWide  Criterion  Canstructe 
Curfrincd  across  letch  A  NOS 
(Corrected  (oi  Canue  Icstriction) 


:  ?? 


0^0 


S  1^1 
J  ^  I  2 


•  4  a 


^15 

w  w  u 
«  •  S 


8  e  o 


SSS8  vSSm  e38e8* 


5  SSSS  S88S3S 


SS883 


^  _  >.8 

*  >>  >  w  « 


5  in  *  f  c 
1  asiwll  z 

M  ••OOww  S 

-  •-•'Jsaa  I? 
z  ..3  = 


•  llz 

^  >  w 

»|.jn 


s  i  I  Itllit  isl? 

5-3,  ~  Sell  ^li£2: 

533^  3  ZZSSZS  <u-.«.a.«. 

asia.  2  ssssss  35335  z 

»>d^  «  £33533  3333  ssssss 

a  nil  a  I  snns  tttt  iilin 


OPOO  W  OPoOOO  wwww  fiPPPSP 

«d«d«d«d  wwwwww  eeeCcc 

X  X  )i  X  f  w  V  w  w  u  w  88S8  SSSsBs 

POOP 

«#  1S««£tS^S  WWWU  yiMiiiiMttiiy 

...I  I  llllll  iSsS  iliiii 

<  W  UWWWWW  <<<<<< 


JOl  Construct:  Autonoay  ®***  ®-®'^  0.02  0.02 
MM  Construct:  loutine  -0.21  -0.20  -0.15  -O.UA  -0.0( 
JOO  Construct:  Org  t  Couorfcer  S«fiport  0.09  0.11  0.10  O.U!i  O.OV 


Corrclatiora  U>ti«:<n  Ik*  i*r«4lUtor  Cmwlrucia 
Mkl  Coi*  l«clMtic*l  fiolUltnrLy 
(Corr«cteti  lor  Imhi*  •cklrlcllut) 


I 

s 


w 

i 

u 


^  Ok  » 

k# 

s  •  *  • 

o  o  o  e 

O 

d 

O  O  ^  O  M 

^  M  «%  M  ^  M 

^  M  ^  M 

^  aa  ^  ^ 

•  •so 

o  o  o  o 

••  a  V  V  w  a 

S  C  ^  S 
0*000 

o  ^ 

;s 

d 

9b» 

o  o  ^ 

M  M  ^  ^  ^  ^ 

d  d  d  d  d  d 

a»  e  O  o  ^  ^ 

o  O  IV 

^  o  ^ 

d  d  d  d 

O  M  a>  IV 

'  ^  S  5  =  8 
d  d  d  d  d  d 

Ok  ^  m  o  ^  fv 

M  ^  O 

o  d  o  o 

^  O 

4^  ^  ^  ^ 
*•00 

o  o  o  o 

d 

d 

«*  ^  4>i  A«  ^ 

d  d  d  d  d  d 

JCSS;42S 

*  *  •  • 

^a  ^  ^  ^ 

•  •  •  • 
o  o  o  o 
• 

K»  fV  aa  » 

o  o  o  o 
d  d  d  d 

o  e  k-^  e 

d  d  d  d  d  d 

8S:£S88 

«i  •  a  oi 

d 

•  s  s  • 

;»  m  o  o 

^  ^  e  IV 

•  •  •  s 

o  o  o  o 

SSS;S8 

•  •  •  • 

o 

O  O  ^ 

•  •  *  • 

o  o  o  o 

s 

* 

o 

•  ••••• 

•  •  •  s 

8SS:£ 

•  •  •  0 

«  o  e  a 

• 

SSaSSS 

<«»•>« 

in  «r  ^ 

•  o»  o  d 

• 

o 

^  M  o  o  ^ 

^  in  IV  •» 

•  0  •  s  •  s 

o  o  o  o  o  o 

•  •  •  * 

ssss 

SOS* 

o  o  a  e 

SS:£388 

•  s  s  s  s  • 

a  a  a  a  a  a 

iSSSiC 

S  •  0  • 

•  o  e  e 

• 

o 

•  ••*•• 
e  o  e  e  e  e 
•  *  •  • 

s^ss 

•  s  •  • 

o  o  o  o 

8SS88S 
•  ••••• 
a  a  a  o  a  e 

3.  3.  * 

d  d  d  d 

o 

• 

o 

m  ^  o 

on  m  ^  ^  ^ 

*  s  •  • 

o  o  o 

M  o  ^  ^ 

•  •so 

a  a  a  a 

8SSS88 
•  ••••• 

a  a  a  a  a  a 

lx,. 

•  m 

e 

.  *5, 

m 

8. 

«i  tt  ^  s  s 

o 

aa 

Si ,1 kl 

^  ^5^2  k  k 

9  9.  1  9  • 

Ir 

^  o  V  ^  y 

1  •  >  9  J  * 

Ij 

1 

rfUii 

u  u  s  ^  o  w 

K|u- 

siP 

*  *f  s  »- 

*9  *9  OS  S*  99  •* 

k  k  k  k  y  k 

V  w  V  V  k  V 
aa  aa  aa  aa  aa  M 

n§|§s 

u  w  w  u  w  u 

llll 

as 

2 

m 

1 

•••«•••«*•  *9 

tit ttt 

aa  aa  aa  ay  aa  aa 

fill 

*9  0*  •*  •« 

aa  aa  aa  aa 

•««•*••• 
k  k » 

isnn 

•0  «*  «#  as 

k  k  k  k 

•lb  Ms  Oki 

J 

w  w  o  o  w  w 

C  «s  W  W  W  W 

•  ••#«« 

aa  aa  aa  aa 

sns 

iiii 

aa 

1 

W 

aa  aa  aa  W  aa  aa 

Hllll 

u  w  w  w  w  w 

u  w  w  w 

IV  Hi  III  Ml 
^  «a  w  w 

3  3  S3 

Ml  HI  HI  HI  HI  HI 
U  W  W  W  W  U 

fssiss 

<<<<<< 

164 


J08  Construct:  AutonoMy  0.2t  0.22  0.09  0.22  0.2S  0.21  0.21  0.21  0.09 
JOS  Construct:  lout  In*  -0.22  -0.10  -0.2/  *0.19  0.21  *0.20  0.19  0.22  -O.M 
JOS  Construct:  Org  t  Couorker  Support  O.U  0.11  O.OS  -0.02  0.06  O.U  0.20  0.10  -  0.01 


Moving  on  from  this  point  our  future  validity  analyses  will  Oe  concerned 


with: 


e 


More  precise  estimates  of  validity  generalization  across  jcOs  as 
a  function  of  criterion  content  and  predictor  battery  conposi- 

tion. 


Estimation  of  differential  prediction  across  race  and  gender 
groups  as  a  function  cf  criterion  content  and  predictor  battery 

composition. 


t 


Estimation  of  overall  selection  validity  (against 
composite)  as  a  function  of  criterion  component 
preoictor  battery  composition. 


a  criterion 
weights  and 


•  Estimation  of  classification  efficiency. 


REFERENCES 


Campbell,  J.  P.  (1986).  When  the  textbook  ooes  operational.  Paper  presented 
at  the  94th  Annual  Convention  of  the  American  r^sycnological  Association, 
Washington,  O.C. 

uunnette,  M.  D.  (1963).  A  modified  model  for  selection  researcn.  Journal  of 
Applied  Psychology.  £7,  317-323. 

Schmidt,  F.  L.,  A  Kaplan,  L.  B.  (1971).  Composite  vs.  multiple  criteria:  A 
review  and  resolution  of  the  controversy.  Personnel  Psyc-olocv.  2£, 
Ai9-A34. 

Wallace,  S.  R.  (1965).  Criteria  for  what?  Ar^enican  Psvcholocist . 

4i:-4ia. 

Wise,  L.  W.,  Campbell,  J.  P.,  Manser,  L.  M.,  i  McHenry,  J.  J.  (1936).  £ 
latent  structure  model  of  job  performance  factors.  Paper  presentee  at 
the  9Atn  Annual  Conventionof  tne  American  Psychological  Associatior, 
Washington,  O.C. 


165 


A  LATENT  STRUCTURE  MODEL  OF  JOB  PERFORMANCE  FACTORS:  APPENDIX* 


Jeffrey  J.  McHenry 
Lauress  L.  Wise 

American  Institutes  for  Research 
John  P.  Campbell 

Human  Resources  Research  Organization 

Lawrence  M.  Hanser 
U.S.  Army  Research  Institute 


Presented  at  a  Data  Analysis  Workshop  of  the 
Committee  on  Performance  of  Military  Personnel 

Baltimore 

December  1986 


*  This  Appendix  supplements  the  paper,,  ”A  Latent  Structure  Model  of  Job 
Performance  Factors,"  first  presented  at  the  Convention  of  the  American 
Psychological  Association  in  August  1986,  and  available  in  ARI  Research 
Note  813704. 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


167 


ASSmiPTZOVS  ABOUT  JOB  FBBTOBMAHCB 
ZB  BMTRT-LEVBL  BMLZ8TED  MOS 


Performance  is  not  one  thing.  It  is  genuinely 
multidimensional.  Consequently,  no  one  measiire  can  be 
identified  as  the  measure  of  performance.  There  is  no 
ultimate  criterion.  For  example,  a  job  simulation,  no 
matter  how  eladsorate,  reliable,  and  valid,  is  not  an 
ultimate  measure.  All  measures  have  their  measurement 
flaws,  and  all  measures  are  constrained  to  some  portion 
of  the  job  performance  domain. 


For  the  population  of  first^tour  enlisted  MOS,  it  makes 
sense  to  talk  about  iob  performance  factors  that  are 
defined  the  same  wav  across  jobs  fi.e..  Armv-wide  factors) 
and  performance  factors  that  are  specific  to  a  particular 
lob.  The  job-specific  factors  are  the  latent  variables 
that  captxire  the  differences  in  task  content  that  require 
different  Icnowledges,  skills,  and  abilities  (KSAs) .  The 
Army-wide  factors  reflect  tasks  amd  other  components  of 
job  performance  that  do  not  require  different  KSAs. 


While  the  content  of  job-specific  factors  and  the  degree 
of  required  performance  on  any  particular  factor  may 
differ  across  jobs,  the  latent  structure  or  basic  form  of 
performance  is  the  same  for  all  skilled  entry  level  1obs 
in  the  military. 


To  form  criterion  composites  that  represent  an  overall 
performance  score  for  test  validation  purposes,  or  some 
other  purpose  that  requires  a  single  score,  the  individual 
latent  variadsles  must  be  measured,  scored,  weighted,  and 
combined  in  some  fashion.  When  forming  criterion 
composites  for  different  jobs,  it  is  not  the  form  of  the 
latent  variables  that  changes,  but  their  relative  weight. 
Also,  it  is  a  value  judgment  of  the  organization  as  to 
how  much  a  particular  performance  component  should  be 
weighted  for  a  particular  measurement  purpose. 


^®®I*miNARY  MODEL  Of  ENLISTED  ilOB  PERFORMANCE 


169 


MEASUSSMEOT  METHODS 

•  Amy-Wldtt  BASS 

•  MOS-Sp«clflc  BARS 

•  Combat  Performanca  Prediction  Scales 
e  Administrative  Measures 

e  School  Knowledge  Test 
e  Job  Knowledge  Test 
e  Hands-On  Performance  Test 


170 


JkSMY-WZDE  BEHAVIOSJkLLY-AHCHOSED  SXTZK6  SCALES  (BARS) 

SUMMARY  FACTORS 

•  Effort  and  Laadership 

•  Personal  Discipline 

e  Physical  Fitness  and  Military  Bearing 
e  Overall  Effectiveness 

MOS-SPECZFXC  BEHAVIORALLY- ANCHORED  RATING  SCALES  (BARS) 

SUMMARY  FACTORS 

e  Core  Responsibilities 
e  Other  Responsibilities 


171 


COMBAT  PERFOBMAVCB  PREDZCTZOH  SCALES 


SUMMARY  FACTORS 

•  Performing  Well  under  Adverse  Conditions 
e  Avoiding  MistaUces 


ADHZMZSTRATIVE  MEASURES 
SUMMARY  nPACTORS" 

e  Letters  and  Certificates 
e  Physical  Readiness  Test  Score 
e  M16  Qualification  Score 
e  Articles  15/Flag  Actions 
e  Promotion  Rate  Deviation  Score 


172 


SAMPLE  PUMCTIOMAL  DUTY  CLUSTER  DEPIMZTIOHS 


First  Aid 


Consists  of  items  whose  primary  purpose  is  to 
indicate  knowledge  about  how  to  sustain  life, 
prevent  health  complications  caused  by  travima  or 
environmentally  induced  illness,  including  the 
practice  of  personal  hygiene.  Includes  all  related 
diagnostic,  transportation,  and  treatment  items 
except  those  items  normally  performed  in  a  patient 
care  facility.  Includes  items  related  to  safety 
and  safety  hazards. 


Navigate 

Consists  of  items  whose  primary  pxirpose  is  to 
indicate  knowledge  about  how  to  plan  or  execute 
movement  between  points  over  unknown  terrain  either 
cross-country  or  using  road  networks,  or  identify 
the  location  of  objects.  Includes  all  means  of 
determining  direction,  distances,  and  locations 
using  maps  of  all  types,  overlays,  compasses, 
terrain,  celestial  objects,  and  field  expedients. 

NBC 

Consists  of  items  whose  primary  purpose  is  to 
indicate  knowledge  about  performamce  when  nuclear, 
biological,  or  chemical  conteuninants  and  threats 
are  present,  planned,  detected,  or  expected. 
Includes  maintenance  and  operation  of  clothing, 
gear,  and  equipment  whose  primary  purpose  is  to 
counter,  protect,  or  detect  NBC  threats.  Includes 
NRC  markers.  Does  not  include  first  aid  treatment 
of  conteunination. 

Weapons 


Consists  of  items  whose  primary  purpose  is  to 
indicate  knowledge  zUsout  maintenance,  preparation, 
and  firing  of  small  arms.  Small  arms  are  defined 
as  sized  weapons,  including  automatic  weapons,  up 
to  and  including  caliber  .60  and  shotguns.  Includes 
ancillary  sighting  systems  and  techniques,  stands 
and  mounts,  zeroing  and  techniques  of  fire.  Excludes 
firing  from  aircraft  and  vehicles  where  the  weapon 
is  fired  by  electrical/hydraulic  aiming/ firing 
systems  and  sighting  systems  that  are  part  of  the 
aircraft/vehicle  and  not  part  of  the  weapon. 


173 


rONCTIONAL  DUTY  CLUSTERS  BY  MOS 


174 


eands-on,  job 


KNOWLEDGE #  AND  SCHOOL  KNOWLEDGE  TESTS 
SUMMARY  FACTORS 


•  Core  Technical  (MOS-specific) 
e  Coamunlcatlons 

•  Vehicle  Operation  and  Maintenance 
e  General  Soldiering 

e  Identifying  Target  and  Threat  Vehicles 
and  Aircraft 


e  Safety  and  Survival 


REASONS  FOR  IDENTIFYING  A  REDUCED  SET  OF  PERFORMANCE  CONSTRUCTS 
FROM  THE  WXTHIN-METHOD  PERFORMANCE  FACTORS 

e  Desire  for  performance  indices  that  incorporate 
information  from  multiple  measurement  methods 

e  Parsimony  (high  correlations  between  many  of  the 
factors) 

e  Reliability 

e  Construct  weighting 


175 


JOB  PERFORMBNCS  MEASURE  SUMMARY  STATISTICS 
FOR  IIB:  IMFAMTRY 

f  VARIABLE  IM  S9  1  2  3  4  S  &  7  8  9  10  tl  12  13  14  IS  16  17  18  19  20  21  22  23  24  3 


1  Ovtrall  RatlM  4.60  0.90  .  90  74  68  77  8S  75  65  23  12  17-35  36  26  14  4  35  25  11  10  23  19  18  12  14 

2  Eff/Ldr  Ratinfl  4.41  0.82  90  .  74  65  80  88  80  67  24  8  13-30  36  30  12  5  36  27  10  13  33  20  20  9  17 

3  Biteiplin  Rta4  4.50  0.87  74  74  .  49  55  71  63  66  13  3  7-39  31  16  10  3  30  22  6  8  24  13  13  5  13 

4  FiUMt  aniM  4.86  0.89  68  65  49  .  59  66  52  45  17  27  9-24  22  10  9  -1  10  10  -2  -4  13  6  6  1  1 

5  Jak-Sfte  Tteft  32.98  4.58  77  80  55  59  .  86  75  58  23  15  17-20  22  27  15  5  35  22  12  10  36  21  23  9  16 

6  JaHSMe  Otktr  22.67  3.66  85  88  71  66  86  .  80  67  S  8  14-28  32  23  10  6  35  26  12  12  33  17  22  11  17 

7  Caalat  Eiapln  9.02  1.49  75  80  63  52  75  80  .  75  24  8  13-31  29  28  12  7  37  25  9  16  34  22  23  9  19 

9  CaiSat  Fr«kI»M  10.03  1.64  65  67  66  45  58  67  75  .  14  8  6-33  27  20  7  -1  36  24  9  15  31  H  18  8  14 

9  AmpOs  k  Carts  3.33  2.18  23  24  13  17  23  25  24  14  .  15  20  -2  4  13  6  -1  14  IS  -0  13  9  9  5  4  12 

10  Fbrs.  Rtadiatss  273.44  28.00  12  8  3  27  IS  8  8  8  15  .  11  2  -6  1  -7  -9  0  5  -7  -0  3  -2  -1  -4  -e 

11  R16  Suilfie.  2.74  0.57  17  13  7  9  1?  14  13  6  20  11  .  1  1  13  6  -0  10  2  3  0  14  10  5  3  6 

12  Artieiis  15  0.39  0.85-35-30-39-24-20-28-31-33  -2  2  1  .-45-10  -1  -6-10  -9  -S  -6-10  -1-9  0-5 

13  ProMtioa  Rati  0.03  0.68  36  36  31  22  22  32  29  27  4  -6  1-45  .  16  7  7  19  17  12  10  13  14  12  11  17 

14  HO  Buie  50.50  10.06  26  30  16  10  27  23  28  20  13  1  13-10  16  .  15  6  44  30  13  27  40  24  20  16  jO 

15  HO  Safttr  22.67  3.41  14  12  10  9  15  10  12  7  6  -7  6  -1  7  15  .  2  16  8  1  3  16  7  3  3  4 

16  HO  Can  13.15  1.53  4  5  3  -1  5  6  7  -I  -1  -9  -0  -6  7  6  2  .  4  6  -1  -3  0  4  6  2  -1 

17  JK  Basie  50.73  9.71  35  36  30  10  35  35  37  36  14  0  10-10  19  44  16  4  .  68  40  42  65  50  40  30  35 

18  JS  aafetY  20.02  4.31  25  H  Z  10  22  26  25  24  15  5  2  -9  17  30  3  6  68  .  Z1  26  47  41  32  13  20 

19  JX  Can  4.37  1.47  11  10  6  -2  12  12  9  9  -0  -7  3  -8  12  13  I  -I  40  23  .  16  26  25  19  14  IS 

20  JX  IdintiFr  8.25  2.24  10  13  8  -4  10  12  16  15  13  -0  0  -6  10  27  8  -3  42  26  16  .  31  24  18  16  37 

21  SX  Basie  72.87  14.89  33  33  24  13  36  33  34  31  9  8  14-10  18  40  16  0  65  47  26  31  .  63  60  44  43 

22  3X  Safatr  9.51  2.12  19  20  13  6  21  17  22  21  9  -2  10  -1  14  24  7  4  50  41  3  24  63  .  45  34  26 

23  SX  Can  5.68  1.67  18  20  13  6  23  22  23  18  5  -1  5  -9  12  20  3  6  40  S  19  18  60  45  .  40  31 

24  SX  Vakiela  0.73  0.42  12  9  5  1  9  11  9  8  4  -4  3  0  11  16  3  2  30  3  14  16  44  34  ao  .21 

3  3  iaiBtify  2.30  1.16  14  17  13  1  16  17  19  14  12  -6  6  -5  17  30  4  -1  3  20  13  37  43  26  31  21  . 

N«  503 


176 


JOB  VESF0RMBNC2  MEASTJRB  BUlOfARY  STATISTICS 


I  VMUBlf 


1  Ottrail  Ratitj 
£  Eff/Lir  litiM 

3  OiieiPliat  Rtn4 

4  Fitatsi  aaiia) 

5  Jtt*Sate  Taeh 
A  Jal*SMe  Otatr 
7  Cealat  eiaalrr 
3  Caalat  FralitM 

9  AhiHs  I  Carts 

10  Fh7S<  attOiatss 

11  AlA  Baalifie. 

12  Artieits  IS 

13  ProMtioa  Sita 

14  HO  TtaA. 

15  HO  Basie 
U  HO  Safttr 
17  HO  Ca« 

IS  JK  Ttea. 

:9  JS  Basie 

20  JK  Safttr 

21  Jl  Com 

£2  JX  lOtatifr 
£3  SX  Ttek. 

24  aX  Basie 
£S  ai  Safttr 
29  SS  Com 
~  a  Vtn::i! 

H«  401 


TOR  13B:  CAMIfON  CREWMAN 

HN  SO  1  2  3  4  S  A  7  B  9  10  11  12  13  14  15  lA  17  18  19  20  21  22  23  24  2S  2o  r' 


4.S9  0.79  .  BA  71  A1  A2  72  73  A1  11  10  5-25  30  20  19  17  A  2A  18  14  8  3  24  15  12  8  ? 

4.43  0.7A  BA  .  75  A£  AS  74  78  A1  14  A  1-23  25  27  25  14  9  32  20  15  11  5  30  20  IS  5  13 

4.A1  0.78  71  75  .  51  S3  AO  A3  AO  -0  -4  •1-20  2A  12  9  12  4  22  lA  15  4  3  18  14  14  A  lA 

4.95  0.82  A1  A2  51  .  47  53  51  39  7  23  -1-25  lA  8  4  0  3  5  -1  -1  1  -2  4  -4  -1  -4  -S 

23.59  3.55  A2  AS  S3  47  .  80  AO  39  11  10  1  -2  10  35  18  9  -1  25  10  10  17  8  24  8  12  A  4 

23.90  3.08  72  74  AO  S3  80  .  AA  49  A  5  -4  -9  18  25  18  8  1  29  18  15  13  A  2A  14  lA  4  8 

9.00  1.44  73  78  A3  51  AO  AA  .  A3  14  10  3-15  23  20  23  13  3  22  lA  13  A  8  £3  12  7-1  ! 

9.92  1.5A  A1  A1  AO  39  39  49  A3  .  8  7  -3-lA  2A  14  lA  S  12  19  17  10  14  8  15  14  9  5  3 

2.58  1.82  11  14  -0  7  11  A  14  8  .  12  18  0  8  IS  19  IS  -1  11  10  A  5  8  11  A  S  8  £ 

2A1.74  32.70  10  A  -4  23  10  5  10  7  12  .  11  -3  -2  7  2  -7  8  -9  -9-10  5  4  -0  -9-10-12-15 

2.25  0.A9  5  1  -1  -1  1  -4  3  -3  18  11  .  A  1  7  8  12  -3  -4  -5  -A  7  -3  -3  -7  0  3  -3 

0.44  1.03-25-23-29-25  -2  -9-15-lA  9-3  A  .-31  -9  -4  -5  -5  -7-10-12  -7  1  -5  -4  -2  -5  -2 

0.01  0.A3  30  25  2A  lA  10  18  £3  2A  8  -2  1-31  .  A  10  10  3  10  A  5  5  -1  2  5  -£  10  7 

50,71  9.94  20  27  12  8  35  25  20  14  15  7  7  -0  A  .  47  20  II  33  13  7  10  12  3A  18  20  11  9 

48.50  13.00  19  25  9  4  18  18  23  lA  19  2  8  -4  10  47  .  21  8  42  38  29  9  15  40  25  17  IS  ? 


40.  lA 

17  14  12  0 

9  8  13  8  15  -7  12  -5  10  20  21 

• 

11  24  14  11 

9 

m 

•i 

3  »  18  11 

A  • 

i.*! 

10.AO 

1.59 

A  9 

4  3 

-1  1  3  12 

-1  8-3-5 

3  11  8 

11 

.11-2 

6 

5 

m 

/ 

5  -1 

1 

• 

A 

«* 

50.A7 

9.94  2A  32 

22  5  25  29  22  19  11  -9  -4  -7 

10  33  42 

24 

I  .  58  54 

£0 

A4  52  41 

mm 

j/ 

75 

31.91 

OI 

CD 

4^ 

lA  -1 

10  18  lA  17  10  -9  -5-10 

A  12  38 

14 

1  58  .  55 

14  £3 

r<? 

*9  38 

35 

23.58 

4.43 

14  IS 

15  -i 

10  15  13  iO 

A-IO  -A-i2 

5  7  20 

11 

-2  54  55  . 

19 

•1 

41 

38  35 

2A 

1.12 

0.A8 

a  11 

4  1 

17  13  A  14 

5  5  7  -7 

5  10  9 

9 

A  £1  14  10 

• 

•  A 

19 

!•>  !i 

*0 

«  • 
4** 

t  1 

*  • 

7.12 

2.25 

3  5 

3  -2 

8  A  8  3 

8  4-31 

-1  12  15 

3 

5  20  2  21 

4«a 

• 

20 

•9 

10 

3 

50.32 

9.84 

24  30 

18  4  24  26  3  15  11  -0  -3  -5 

2  3A  40 

25 

7  A4  52  41 

19  £0 

• 

»A  •• 

Om 

38  40 

23.17 

5.27 

15  20 

14  -4 

8  14  12  14 

A  -3  -7  -A 

5  18  25 

20 

5  52  49  28 

«  A 

21 

63 

.  !I  40  52 

3.44 

?  t- 

12  15 

14  -1 

12  lA  7  9 

5-10  0  -2 

-2  20  17 

18 

-1  41  38  35 

lb 

•  A 

*9/ 

31  . 

:3  3A 

-  ** 
i#a  »  • 

1.21 

3  5 

0  -4 

A  4  -1  5 

8-12  3  -5 

10  ii  15 

•  t 

*  « 

I  37  25  2A 

i4 

10 

73 

iO  22 

• 

AA 

«•  * « 

1.07 

9  12 

lA  -3 

4  3  18 

2-15  -3  -3 

7  9  9 

24 

3  35  £;  ;? 

1 1 

•  • 

m 

0 

52  36 

72 

177 


JOB  PBSrOBMBMCB  MSBSUBZ  BUMKARY  STATISTICS 
TOR  192:  ARMOR  CRBmfAM 

I  UWIABLS  HN  SB  1  2  3  4  5  6  7  8  9  10  11  12  13  14  19  U  17  18  19  20  22  S  23  24  2?  26  27  ZS 


1  Oftrall  BitiiM  4.62  0.7B  .  84  72  SB  72  S3  69  61  12  20  8-37  41  12  IS  16  4  29  23  22  19  10  27  26  18  22  13  7 

2  Eff/Ltfr  8<tiM  4.38  0.74  84  .  68  S5  76  SO  80  65  16  21  8-32  41  17  26  19  17  31  34  31  27  14  33  32  22  22  23  11 

3  SiseiPliM  8:m  4.S0  0.83  72  68  .  4S  S3  41  SS  64  -1  12-14-3S  38  6  19  14  2  21  23  18  17  6  27  18  8  22  14  -2 

4  Fitatis  8«tiM  4.76  0.82  S8  99  49  .  44  39  43  36  10  43  -0-19  28  8  2  16  -3  1  5  10  5  -40  -0  -0  4  2  2 

5  .‘86-Sate  Tfch  3.19  3.20  72  76  S3  44  .  79  71  SS  10  14  17-31  34  3  17  19  13  29  S  26  19  3  27  3  18  15  20  a 

6  Jak-Satc  Otftir  14.7!  1.89  3  SO  41  39  79  .  SO  41  9  13  13-18  19  15  9  13  2  8  ?  12  2  4  IS  12  S  <>  «  1 

7  Cartat  SaMlrr  8.38  1.36  69  80  99  43  71  SO  .63  IS  18  8-3  34  IS  3  19  14  3  3  19  20  7  3  3  19  10  18  2 

8  Caalat  PrakliM  9.80  1.47  61  69  64  36  JS  41  63  .  -1  7  4-31  29  13  3  13  6  24  18  3  13  8  24  18  17  15  16  -1 

9  AMarOs  6  Carts  2.3  1.60  3  16  -1  10  10  9  19  -1  .  IS  19  -7  13  6  4  -3  13  9  7  -0  10  -2  3  12  3  4  8  8 

10  Phrs.  Stadiaatf  249.41  3.11  20  21  12  43  14  13  18  7  IS  .  -i-10  10  -3  -3  4  2  -6  0  -6  4  1  -4  1  2  -2  2  -7 

11  .116  a«alifie.  2.40  0.68  8  8-14  -0  17  13  8  4  19  -1  .  14  -1  7  7  3  10  11  12  13  17  31  10  6  12  2  16  -1 

12  Artielts  IS  0.3  0.77-37-3-39-19-31-18-3-31  -7-10  14  .-43  -9  -8-16  1-13-17-17  -7  1-19-13-12  -0  -7  -6 

13  PraMtiOA  Sati  0.03  O.SB  41  41  3  3  34  19  34  29  13  10  -1-43  .  10  7  19  12  14  24  3  21  2  17  3  18  6  IS  1 

14  HO  Tteh.  90.00  9.99  12  17  6  8  3  IS  19  IS  6  -3  7  -9  10  .  18  24  3  36  27'  27  13  18  3  18  ?  2  19  0 

19  HO  Sane  3.16  2.48  IS  3  IS  2  17  9  3  3  4-3  7  -8  7  18  .  21  3  3  3  3  21  18  21  3  11  4  1?  -.7 

16  HO  Safttr  21.89  2.99  16  19  14  16  19  IS  19  13  -3  4  3-16  19  24  21  .  14  3  18  18  10  6  IS  13  S  S  17  6 

1?  'HO  Com  3.3  7.99  4  17  2  -3  13  2  14  6  13  2  10  1  12  3  3  14  .  3  3  3  3  11  20  3  13  3  3  3 

18  JX  Tech.  50.00  9.99  3  31  21  I  3  8  3  24  9  -6  11-13  14  363033  .6092  49  34  64  60443  42  7 

19  jS  Basic  42.16  7.3  3  34  3  9  3  7  3  18  7  0  12-17  24  3  3  18  3  60  .  69  3  30  6S  67  46  41  43  6 

20  JS  SaPstT  21.19  4.13  3  31  18  10  3  12  19  21  -0  -6  13-17  3  3  3  18  3  3  65  .  44  34  46  51  37  3  3  5 

21  M  Cam  11.3  3.99  19  27  17  9  19  2  3  13  10  4  17  -7  21  13  21  10  3  45  3  44  .  16  45  51  34  30  24  2 

3  j«  IdintifT  10.05  1.3  10  14  6  -4  8  4  7  8  -2  I  31  1  2  18  18  6  11  34  30  34  Is  .  24  3  3  18  37  3 

3  a  :*CS.  54.S4  9.66  3  3  27  0  3  IS  3  24  12  -4  10-19  17  3  21  15  20  64  65  *6  45  24  .  75  53  59  48  21 

24  SX  Basic  34.94  3.44  26  3  18  -0  S  12  29  18  12  1  6-13  3  18  3  13  3  60  67  51  !l  3  75  .  68  47  47  12 

3  SX  Safttr  8.18  2.14  18  3  8  -0  18  8  19  17  3  2  12-13  18  9  11  9  13  44  46  3  34  3  53  68  .  38  3  4 

26  SX  Com  7.59  1.80  3  3  3  4  15  9  10  15  4  -2  2  -0  6  2  4  5  3  38  41  26  30  18  59  47  38  .  24  14 

27  SX  VtHicli  0.S4  0.50  7  11  -2  2  6  1  2  -1  B  -7  -1  -6  I  0  -0  6  3  7  6  5  2  3  21  12  4  14  9  . 

3  SX  IdoBtifr  3.01  0.96  18  3  14  2  20  9  18  16  8  2  16  -7  IS  19  19  17  3  42  43  3  24  37  48  47  3  24  .  9 

1*  335 


17C 


JOB  nsroBHjaics  xzbbubs  soioibby  btbtzstxcs 


rOR  3ic:  8ZVOLB  CHARIIZL  RROZO  OPERATOR 
•  IMBIMLE  iR  SB  1  2  3  4  S  4  7  8  9  10  11  12  13  14  15  li  17  tS  If  :i  2  S  24  26  T  3  2? 


1  Ovtrill  Ratios  4.73  0.79  .  83  73  64  74  46  64  44  17  11  2-31  30  20  24  IS  IS  -2  24  17  ?  14  3  13  19  lO  14  2  -2 

2  Eff/Ldr  Ratios  4.48  0.72  83  .  48  37  81  71  48  43  18  12  7-31  30  24  21  21  IS  2  30  3  14  16  6  13  2  12  20  12  4 

3  Biteialiot  Rtos  4.44  0.88  73  48  .  S2  S4  S8  S3  40  4  4-11-32  24  10  14  7  10  -1  20  IS  4  13  6  7  9  -4  10  -2  -S 

4  ritotii  Ratios  S.OS  0.88  44  S7  32  .  47  40  42  42  11  34  -4-2S  24  12  8  10  4  -2  1  2  0  8  -4  6  -5  -S  -2  -9-12 


5  jal-Sotc  Tieo  14.27  2.01  74  81  S4  47  .  74  44  S7  14  4 

4  Jil-SfM  Ototr  14.37  2.09  44  71  38  40  74  .  S4  48  3  -3 

7  Coalat  caaalrr  9.09  1.S4  44  48  33  42  44  S4  .  77  11  1 

8  CaiRat  Pr«41m  10.47  1.71  44  43  40  42  37  48  77  .  9  -1 

9  AmotOi  4  Carts  2.14  1.73  17  18  4  11  14  3  11  9  .  23 

10  Phrs.  RaaSiotss  239.34  29.39  11  12  4  34  4  -3  1  -1  S  . 

11  H14  Saalifie.  2.14  0.77  2  7-11  -4  3  -3  3  -2  10  4 

12  Artielis  15  0.34  0.84-31-31-32-25-14-17-21-22  2-11 

13  PrtMtioo  Rata  -0.02  0.34  30  30  24  24  22  22  17  14  12  4 

14  HO  Tael.  78.44  9.49  20  24  10  12  20  11  4  4  9  1 

13  MO  Basie  21.23  3.84  24  21  14  8  24  18  13  14  12-10 

16  HO  Safatr  20.13  3.99  IS  21  7  10  20  IS  18  11  4  0 

1?  HO  Zorn  14.73  4.39  IS  13  10  4  11  1  23  13  3  1 

13  HO  Valieia  11.73  1.31  -2  2  -I  -2  -I  0-7-0  2-4 

19  A  Taea.  37.14  11.48  24  30  20  1  29  17  24  22  10  -4 

20  Jl  Baste  22.12  4.41  17  28  IS  2  30  22  30  24  10  -8 

Z1  A  Safatr  23.31  4.43  9  14  4  0  14  8  IS  3  11  4 

r  A  Z»m  10.12  2.74  14  16  13  8  IS  9  19  14  -0  1 

2  A  *;tBiela  4.54  1.82  3  6  6  -4  8  2  11  -1  -3  2 

24  JK  Idaatifr  6.72  2.13  13  13  7  6  IS  3  12  3  8  -6 

2  St  Taea.  77.87  15.43  19  2  9  -5  19  5  18  13  8  -4 

24  SB  Basie  10.93  2.74  10  12  -4  -8  IS  2  9  3  12  -0 

27  SX  Safatr  11.08  2.81  14  20  10  -2  16  9  14  9  4  1 

23  3  VaBieia  3.83  1.84  2  12  -2  -9  13  3  10  0  4-13 

29  a  Identifr  1.16  0.93  -2  4  -9-12  -l  -9  7  -3  6  -5 

H«  239 


3-14  S  20  24  20  11  -1  29  30  16  IS  3  15  19  15  16  13  -1 

-3-17  22  11  18  13  1  0  17  2  8  9  2  5  5  :  9  3  -9 

3- 21  17  6  13  18  2  -7  24  30  15  19  11  12  18  9  14  10  7 

-2-2  14  4  14  11  13  -0  2  24  3  14  -1  3  15  5  9  0  -3 

10  2  2  9  2  4  3  2  10  10  11  -0  -3  8  8  2  4  4  6 

4- 11  4  1-10  0  1-6-4-8  4  1  3-8-4-0  1-13  -5 

.  4  3  4  5  10  7  3  7  10  8  -4  -6  5  9  4  4  11  10 

4  .-34  -9  -3  -7-2  -3-16  -9-13-20-10  -3-1  i  -4-12  -4  -3 

3- 34  .  8  2  2  9  5  18  17  10  19  13  i:  13  15  2  4-0 

4- 9  8  .22  2  9  42  21  2  21  2  13  39  21  24  9  S 

3- 3  22  .  18  2  8  31  31  18  IS  5  2!  27  24  27  10  15 

10  -7  21  2  18  .  2  14  10  21  13  9  6  8  11  1C  19  4  ? 

7-2  9  2  27  2  .  1  34  29  21  2  21  2  24  17  11  5  25 

5  -3  5  9  8  16  1  .  11  9  10  2  -6  7  2  16  14  i:  11 

7- 14  18  42  31  10  34  11  .  40  59  6v  37  2  72  50  it  25 

10  -9  17  21  31  21  2  9  40  .  33  50  2  31  49  42  43  40  20 

8- 13  10  2  18  13  21  10  59  2  .  50  2  30  44  eo  48  36  24 

-4-2  19  21  15  9  38  2  40  50  50  .  32  19  *4  Ss  3s  27  I* 

-6-10  13  2  5  6  21  -6  r  2  3  32  .  17  20  ;4  lb  13  11 

5  -3  12  15  22  8  2  7  2  31  2  19  17  2  21  11  40 

9- 11  13  39  2  11  26  2  2  49  44  44  20  27  .  62  56  43  2 

4  -4  IS  2  24  to  17  14  49  42  40  34  14  3  62  .  5s  42  26 

4- 12  12  24  2  19  11  14  SO  43  48  36  16  21  3  56  .  41  2 

11  -4  4  9  10  4  5  12  44  40  36  2  13  11  48  42  ;i  .2 

10- 3-0  8  1!  9  3  11  3  20  24  19  11  40  2  3  2  2  . 


179 


JOB  VZBJOBMBMCS  XBXBTIBE  8T}lliaaKZ  STATISTICS 
TOR  <3B:  LZQBT  BBZOBT  VESICLE  MZCBAHZC 
•  WKIABLe  m  8D  1  2  3  4  5  &  7  B  f  10  11  12  13  14  15  lA  17  18  19  ZO  21  Z  22  24  3  Zi 


1  Oterall  BatiM  4.35  0.84  .  86  75  37  75  75  tf  45  20  7  -4-24  24  11  -1  5  10  20  15  Z  11  21  Z  21  15  19 

2  Eff/Ur  latiaa  4.31  0.83  86  .  75  30  84  78  69  66  21  1  -5-23  3  19  -1  3  12  S  16  Z  18  26  Z  19  14  Z 

3  OiieiPliBt  fftM  4.54  0.8B  75  75  .  51  63  65  39  66  IS  2  -8>Z  26  10  -5  7  •  11  5  19  3  14  Z  20  13  14 

4  Fitani  iatia^  4.82  0.86  57  SO  51  .  38  49  44  41  13  31  2-20  20  -2  -2  8  7  -0  2  8  -0  >2  16  14  13  8 

5  Jai-SNC  TteJi  Z.42  4.10  75  84  63  38  .  78  65  S7Z*!  -5-16  16  3  1  3  13  3  21  26  19  Z  21  18  6  3 

6  Jol-Sate  Otktr  3.19  3.Z  3  78  65  49  78  .  68  3  18  5  -8-18  17  Z  4  4  Z  18  17  Z  13  21  IS  18  9  20 

7  Caakat  eaaalr?  8.87  1.61  68  69  39  44  65  68  .  69  14  4  -7-16  17  Z  0  9  9  16  11  3  8  3  18  14  S  13 

a  Caikat  FrtkitM  9.92  1.86  65  66  66  41  57  3  69  .14-0  -6-3  3  10  -3  4  9  17  11  3  7  19  21  IS  IS  18 

9  AiiaHs  i  Carts  2.31  1.81  3  3  13  Z  21  18  14  14  .  4  2-11  7  11  -5  -0  7  7  2  Z  11  Z  14  10  3  IS 

10  Saaaiaaii  23.47  31.3  7  1  2  31  -1  5  4  -0  4  .  10-10  15  1  8  3  -i  -7-12  -2  -9-10  10-2-* 

11  .916  SMlific.  2.19  0.3  -4  -5  -8  2  -5  -8  -7  -6  2  10  .  1  -9  -2  5  -4  -0  -6  3  3  2  -2  -2  2  -0  4 

Z  Artxelfs  IS  O.Z  0.85-24-23-3-20-16-18-16-20-11-10  1  .-36  -3  -2  -2  -4  -7  -5  -6  -0  -6-11  -7-13  -3 

13  ProMtxaa  Bata  0.04  O.Z  24  3  3  3  16  17  17  3  7  15  -9-36  .  -5  -4  -2  -1  13  9  4  8  Z  16  15  8  12 

14  HO  7ae6.  110.11  6.84  11  19  10  -2  8  Z  13  10  11  1  -2  -3  -5  .  8  6  18  3  3  19  Z  Z  19  16  4  24 

15  HO  3asxe  34.96  4.09  -1  -1  -5  -2  1  4  0  -3  -5  8  5  -2  -4  8  .  10  7  6  Z  14  Z  10  7  15-114 

16  HO  SafttT  21.92  3.3  5  3  7  8  3  4  9  4  -0  3  -4  -2  -2  6  10  .  2  2  5  13  1  I  2  7-7-0 

17  HO  Vtaxexa  ll.Z  1.84  10  Z  9  7  Z  Z  9  9  7  -1  -0  -4  -1  18  7  2  .  15  6  4  11  17  6  6  2  13 

18  Jl  Tack.  68.61  11.93  3  3  11  -0  3  18  16  17  7  -7  -6  -7  Z  3  6  2  15  .  Z  47  Z  67  50  Z  36  59 

19  Jl  Basie  24.36  4.69  IS  16  5  2  3  17  11  11  2-Z  3  -5  9  3  Z  5  6  Z  .  45  44  47  41  36  Z  *4 

3  JR  Safttr  18.9!  3.05  Z  Z  19  8  3  Z  3  3  Z  -2  3  -6  4  19  14  18  4  47  45  .  38  40  36  Z  3  Z 

21  Jl  Viaxclt  15.31  4.03  11  18  3  -0  19  Z  3  7  11  -9  2  -0  8  Z  Z  1  11  Z  44  3  .  56  Z  2:  24  4« 

Z  31  Taea.  56.00  12.89  21  26  14  -2  Z  21  3  19  13-10  -2  -6  13  Z  10  I  17  67  47  40  56  .  Z  *7  30  s9 

Z  31  Basis  16.56  4.24  Z  Z  Z  16  21  18  18  21  14  1  -2-11  16  19  7  2  6  3  4l  36  Z  Z  .  61  5C  56 

24  37.  Safftr  6.02  1.74  Z  19  3  14  18  18  14  IS  10  0  2  -7  15  16  IS  7  6  Z  36  Z  3!  47  6;  .3  50 

Z  SR  Com  0.90  0.3  15  14  13  13  6  9  8  18  8  -3  -0-13  8  4  -1  -7  2  3  Z  20  24  30  50  Z  .  Z 

26  3X  Oiaxcii  24.10  5.54  19  Z  14  8  3  3  13  18  18  -4  4  -6  13  24  14  -0  13  Z  44  Z  49  69  56  50  Z  . 

H*  403 


180 


JOB  PE1170BMA1ICZ  MEASX7BB  BXnfMBRY  STATISTICS 
FOR  64C:  MOTOR  TRANSPORT  OPERATOR 

•  mm£  m  so  i  z  a  4  s  4  7  b  9  lo  ii  12  i3  u  is  it  i7  is  :o  :i  z:  a  24 


1  Gvtrall  RatiM  4.SZ  0.7B  .  84  7B  43  72  39  48  SB  13  11  4*30  33  8  20  IS  14  17  19  3  13  12  Id  14 

2  cff/Ur  RatiM  4.34  0.7S  84  .  77  S9  78  49  74  SB  17  9  4-2S  31  14  29  18  23  22  24  7  21  17  15  21 

3  OisaiRliat  Rtaf  4.S3  0.81  78  77  .  SZ  47  SI  SB  S4  10  3  -^^9  35  4  14  14  IS  IS  19  1  14  12  IS  14 

4  FitaMi  Ratitj  4.74  0.r  43  S9  S2  .  S4  39  44  3S  3  28  3-20  21  -2  8  14  S  4  4  2  5  4  7  -3 

5  Itl-SMC  Taci  29.41  3.74  72  78  47  S4  .  78  4S  32  13  4  7-21  3  9  14  19  17  20  19  5  14  17  12  IS 

4  JM-Omc  Otitr  17.79  2.S2  S9  49  SI  39  78  .  43  41  18  4  13-lS  19  12  11  14  17  14  19  4  13  17  7  ii 

7  CtMat  EiMlrr  BJO  1.4S  48  74  SB  44  4S  43  .  4S  12  4  11-21  S  3  19  14  20  IS  20  S  IS  3  iO  15 

8  CMlat  PrtdlMS  9.50  1.43  SB  SB  34  3  32  41  4S  .  8  -3  2-24  24  12  15  10  14  17  3  7  15  14  20  19 

9  AMrtf  I  Cans  3.12  2.08  13  17  10  3  13  18  12  8  .  4  11  S  12  8  4  S  >3  -2  1  4  3  4  -1  2 

10  Furs.  Rtalitnt  248.48  37.70  11  9  3  3  4  4  4  -3  4  .  3  -4  -1  -1  3  2  -4  -4  -4  2  -2  -5  0  -8 

11  1114  Baaiific.  2.09  0.3  4  4  -2  3  7  13  11  2  11  3  .  4  -9  9  13  5  7  5  3  -0  -1  -3  I  -1 

12  A«iela$  15  0.44  0.98-30-25-29-20-21-15-21-24  5  -4  4  .-34  -1-11-11  -7-13-12  0  -5  -e-iZ  -* 

13  Fmottra  Rati  -0.01  0.S7  S  31  3  21  S  19  S  3  12  -1  -S-3  .  10  9  10  9  12  11  5  11  9  8  II 

14  HO  Basie  43.44  10.14  8  14  4  -2  9  12  3  12  8  -1  9  -I  10  .  3  10  44  31  30  7  3  21  a  32 

15  «  Safitr  83.73  9.84  20  20  14  8  14  11  19  IS  4  3  13-11  9  3  .  14  3  31  24  4  24  19  14  24 

14  HO  Ochieli  33.30  4.19  IS  18  14  14  19  14  14  10  5  2  5-11  10  10  14  .  5  8  15  3  10  11  I  II 

17  A  Basie  3.3  5.82  14  3  IS  5  17  17  3  14  -3  -4  7  -7  9  44  3  5  .  a?  54  10  47  39  20  49 

13  JK  Safttr  33.42  5.42  17  3  IS  4  20  14  15  17  -2  -4  5-13  12  31  31  8  47  .  49  4  42  47  3  49 

19  A  Vifeieii  3S.40  7.70  19  24  19  4  19  19  20  3  1-4  3-12  ii  30  24  iS  54  49  .  11  49  40  3  55 

20  A  iMitifr  2.15  i.4i  3  7  1  2  5  4  5  7  4  2  -0  0  5  7  4  3  10  4  11  .  17  10  -2  13 

21  A  Sasic  14.41  4.34  13  21  14  5  14  13  18  15  3  -2  -I  -5  11  28  24  iO  47  42  49  17  .  54  43  aS 

2  a  SafttT  4.44  1.93  12  17  12  4  17  17  8  14  4  .5  -3  -8  9  21  19  11  3  47  40  10  54  .  34  5« 

S  »  CiM  0.89  0.3Z  14  15  15  7  12  7  10  20  -i  0  1-12  8  4  14  1  20  23  3  -2  43  34  .3 

24  a  Vclitelt  55.72  10.07  14  21  14  -2  15  14  15  19  2  -8  -1  -4  11  32  24  11  49  49  55  13  43  59  3?  . 

H«  43 


ICl 


#  VMIABi 


1  Ovtrall  Sating 

2  £ff/ldr  9iiiD9 

3  Oiieifliat  Ktaj 

4  Fitatsi  latitj 

5  Jad-SMc  Ttea 

6  Jal-Sfte  Otfetr 

7  Caalat  Eaaalrr 

8  Caalat  Prallaat 
4-4itirds  I  Carta 

10  Stadioaat 

11  ni  3«alifi:. 

12  Artieitt  IS 

13  Praaatian  8ata 

14  HO  Tael. 

15  HO  Baiic 
II  HO  SaPfti 
1?  JK  Tfca. 

18  JK  Basic 
1?  JK  aafetr 

20  SX  Tael. 

21  SX  Basie 
2  SX  Safttr 
a  SX  Caaa 

24  SX  Vtlieia 

H>  3S3 


JOB  PBR70KiatliCZ  MEASUKE  SUMMARY  STATISTICS 

FOR  71L:  ADMINISTRATIVE  CLERK 
l«  SO  1  2  3  4  S  I  7  8  9  to  11  12  13  14  15  II  17  18  19  20  21  a  a  24 


4.92  0.85  .  83  71  57  72  13  13  59  20  24  4-3  20  17  14  3  2  15  17  2  13  11  5  1C 

4.14  0.78  83  .  a  51  a  IS  70  10  2  19  2-19  19  S  14  2  29  17  18  3  17  9  7  11 

5.01  0.88  71  a  .  47  13  55  SB  SB  13  13  4-2  19  3  10  -3  2  IS  11  20  7  3  1  4 

S.a  0.89  57  SI  47  .  40  39  55  49  3  S  5-2  3  3  7  -3  1  2  2  -1  0  -5  0  -2 

19.3  2.a  a  a  13  40  .  7|  54  3  8  7  -5-2  2  24  8  -2  3  II  II  3  10  9  I  7 

18.57  3.13  13  IS  3  3  71  .  3  41  10  13  -t-21  17  2  13  1  2  IS  II  3  3  9  10  8 

8.74  1.83  13  a  3  3  54  3  .2  24  19  8-15  18  9  3  11  13  2  17  14  13  3  8  2 

lO.a  l.a  3  M  3  49  3  41  a  .  21  II  7-2  13  12  14  I  11  3  12  15  12  9  1  14 

2.12  l.a  3  2  13  20  8  10  24  2  .  17  3  -4  9  -0  10  -1  -0  5  11  -0  -2  -2  5  1 

210.40  2.3  24  19  13  2  7  13  19  II  17  .11-9  5  1  I  5  0  -5  3  5  4  12  2  3 

1.31  0.3  4  2  4  5  -5  -1  3  7  3  11  .  3  2  -4  2  8  -I  7  3  -3  2  -7  -l  12 

0.2  0.12-23-19-27-2-2-21-15-2  -4  -9  3  .-42-13  -5  I-IO  -7  2-10  -5  -S  -5  4 

0.01  0.41  3  19  19  3  2  17  18  13  9  5  2-42  .12  5  2  I  I  9  5  ?  a  4-0 

tt.09  14.3  17  2  3  3  24  2  9  12  -0  1  -4-13  12  .  3  13  58  34  2  58  2  2  7  11 

18.51  5.3  14  14  10  7  8  13  3  14  10  1  2-5  5  3  .  43  2  48  2  2  21  17  1  2 

3.54  4.3  3  2  -3  -3  -2  1  II  I  -I  5  8  1  2  13  43  .  11  2  2  7  13  10  0  17 

42.2  9.2  2  2  2  1  3  2  13  11  -0  v  -l-iO  I  2  2  11  .  47  48  2  42  24  17  17 

2.2  5.11  15  17  IS  2  II  15  a  21  5-5  7-7  I  34  48  2  47  .  50  40  44  27  27  2 

11.24  3.01  17  18  11  2  II  II  17  2  11  8  3  2  9  2  2  2  48  50  .  43  2  2  19  2 

44.99  9.a  21  2  3  -1  2  21  14  IS  -0  5  -3-10  5  2  2  7  a  40  43  .  44  2  IS  II 

9.90  2.2  13  17  7  0  10  8  13  13  -2  4  2  -5  7  2  2  13  42  44  2  44  .  2  13  31 

4.2  1.3  11  9  8  -5  9  9  3  9  -2  12  -7  -5  I  2  17  10  24  2  2  2  2  .  4  15 

0.2  0.48  5  7  1  0  I  10  B  1  5  2  -1  -5  4  7  I  0  17  2  19  15  18  4  .11 

2.71  1.21  10  11  4  -2  7  8  a  14  1  8  13  4  -0  11  2  17  17  2  2  II  31  15  11  . 


182 


I  MRIABii 


I  Ovtrtll  iratlno 
Z  Vf/liT  SatiM 

3  Sitexpiiaa  Stiij 

4  ritatii  DatiM 
3  JoI-Smc  Tmii 

6  itk'-Sffc  Otatr 

7  Caalat  unairr 
3  Caiiat  ^roklMS 
9-AHir4s  i  wans 
'tC  Bta4itt«t 

11  .il4'S«alifie. 

12  Art;e:ts  IS 

13  ’raMCon  Satt 

14  HO  Tteht 

15  HO  Buie 

16  HO  SafttT 

17  JS  Tick. 

Ic  JK  Baiic 
i?  iK  Safftr 
20  A  Vtkiclt 
2;  JK  IdentifT 
22  SS  Ttch. 

22  9  Baiie 
24  9  SafttT 

•S  3A  /ffllCK 

H»  372 


JOB  PERFORMANCE  MEASURE  SUMMARY  STATISTICS 


FOR  91A:  MEDICAL  SPECIALIST 
}M  SD  1  2  3  4  5  6  7  8  9  10  11  12  13  14  IS  16  17  IS  19  20 


4.61  0.82  .  86  78  60  67  62  71  70  22  IS  -2-29  32  17  6  13  28  24  23  4 

4.40  0.77  86  .  76  56  73  67  73  71  24  13  -4-30  33  20  9  19  26  2S  21  >2 

4.S4  0.91  78  76  .  47  60  47  38  69  12  7  -8-29  31  15  11  13  28  21  20  -4 

4.74  0.92  60  36  47  .  4!  38  49  47  10  39  0-20  18  3  -0  -0  3  7  4  7 

23.09  3.24  67  73  60  41  .  67  35  54  IS  6  -1-27  26  18  2  13  22  16  14  -3 

18.47  2.35  62  67  47  38  67  .  64  31  28  7  9-17  27  10  6  16  18  25  20  5 

9.20  1.48  71  73  38  49  33  64  .  79  30  9  9-20  26  16  10  IS  2  23  ^  I 

10.11  i.77  70  71  69  47  34  51  79  .23  5  -3-28  30  14  6  11  24  22  23  -i 
3.04  2.01  22  24  12  to  IS  28  30  23  .  14  34  -8  13  3  7  22  4  10  3  II 

235.71  31.94  15  13  7  39  6  7  9  3  14  .  17-11  -2  4  -6  -5  -3  -7  -2  3 

2.08  0.78  -2  -4  -8  0  -1  9  9  -5  34  17  .  -I  -4  3  0  8  -8  5  -7  -0 

0.41  0.89-29-30-29-20-27-17-20-28  -9-11  -1  .-33-10  1  -7-10  -7  -6  12 

-0.00  0.58  32  33  31  18  26  27  26  30  13  -2  -4-33  .  10  9  7  16  20  9  -9 

30.48  10.02  17  20  13  3  18  10  16  14  3  4  3-10  10  .  16  34  39  27  30  2 

9.57  3.00  6  9  11  -0  2  6  10  6  7  -6  0  1  9  16  .  17  21  37  21  ? 

33.32  4.30  13  19  IS  -0  13  16  15  11  22  -5  3  -7  7  34  17  .  S  32  33  2 

35.32  13.71  28  26  3  3  22  18  22  24  4  -3  -8-10  16  39  21  32  .  S4  73  ■“ 

15.19  3.63  24  23  21  7  16  23  23  22  10  -7  5  -7  20  27  37  22  54  .33  3 

42.71  7.3S  25  21  20  4  14  20  22  22  3  -2  -7  -6  9  30  21  33  76  33  .12 

2.42  1.04  4  -2  -4  7  -3  3  1  -I  11  3  -0  12  -9  2  9  3  13  3  12  . 

6.62  2.32  8  13  6  1  3  IS  18  9  18  -5  12  -5  11  13  14  17  16  24  16  10 

91.65  17.57  28  33  33  -l  322322r  4-8  -4-13  18  44  17  30  67  41  33  2 

2.04  0.78  12  14  14  2  3  11  20  28  11  -3  -2-16  11  8  18  10  20  23  21  -3 
3.77  1.56  IS  16  11  -4  IS  18  17  16  12  -8  2  -6  14  28  22  33  48  38  49  6 

4.51  1.62  6  9  10-15  7  16  12  12  6  -7  2  -1  II  14  11  18  22  Z  2l  6 


m  4 

•C 

s 

28 

A* 

tc 

3 

13 

33 

•  4 

1C 

le 

9 

6 

33 

4  4 

iC 

9  1 

*  A 

10 

1 

•  I 
* 

2 

-4-15 

3 

mm 

15 

15 

1 : 

« • 

IS 

id 

18  28 

20 

•  m 

ii 

12 

9 

mm 

28  16 

•  m 

IS 

• 

• 

il 

4  m 

A«» 

6 

-5  -6 

-3  -S 

• 

-/ 

9^ 

-4 

m 
m  • 

m 

m 

m 

m 

•t.t':. 

«r  •« 

•la 

-6 

m  1 
• 

11 

IS 

«  • 

•  • 

•  1 

•  A 

IS 

44 

•k 

0 

23 

•  • 
4*9 

14 

.  V 
•  / 

IS 

*!*** 

«  • 

•  m 

i  * 

:c 

•  A 

•  f  • 

•  V 

mm 

mm 

•  m 

.3 

65 

S  * 

•5 

22 

A  * 

*  • 

mm 

mm 

la 

cc 

*91 

•A 

4? 

Cl 

10 

m 

Q 

6 

• 

tC 

•C 

•  A 

«« 

•  m 
im 

1C 

• 

24 

52 

m  • 

aa 

« r 

4*9 

• 

26 

1* 

•  A 

•  M 

52 

26 

. 

mm 

•  a* 

a 

36 

mi 

• 

183 


I  VARIABLE 


1  0«mll  Bitistf 

2  Eff/ldr  Ratiad 

3  Oisciaiiaa  RtiM 

4  ritaiis  RatiiM 

5  jat-Sate  Taeh 

6  Jal*^c  OUtr 

7  Caalat  Euair? 

8  Caalat  Prallm 

9  Auardi  1  Carts 

10  ?hrs.  Rtadtatft 

11  fllA  Soalifhrr 

12  Artielit  13  _ 

13  PrvMtiOB  Rata 

14  HO  7fea. 

15  HO  Basle 
li  HO  Safitr 
:7  IS  Cow 

13  HO  Vtaieit 
I?  jX  Taea. 

20  JK  Basie 
:i  JX  aafttr 

22  JX  CsM 

23  JX  Vtbiela 

24  JX  Idaatiff 

25  SX  Tael). 

2a  sX  Sasic 
27  3X  aafttr 
23  SX  Com 

29  SX  Vthiela 

30  SX  laeatiFr 

H<  506 


JOB  PERFORMANCB  MZASTUtZ  SUMMARY  STATISTICS 
FOR  9SB:  MILITARY  POLICE 

AN  SD  1  2  3  4  5  6  7  3  9  10  11  12  13  14  IS  16  1?  18  19  20  21  22  23  24  3  2a  27  23  2?  30 


4.74  0.80  .  87  69  70  78  68  74  70  18  22  13-28  21  15  18 

4.50  0.73  87  .  71  61  77  72  73  68  20  17  11-22  19  14  21 

4.71  0.77  69  71  .  46  65  48  55  63  6  7  4-27  26  6  9 

4.90  0.84  TO  61  46  .  58  56  55  52  16  43  13-26  16  9  12 

29.00  3.66  78  77  65  58  .  73  68  63  15  15  11-19  17  12  19 

23.60  3.10  68  72  48  56  73  .  71  61  32  18  27-16  8  11  22 

9.56  1.36  74  78  55  55  68  71  .  79  19  19  16-17  14  17  19  14  14 

10.45 


<3««a 


1.53  70  68  63  52  63  61  79  .  15  15  15-28  21 
2.09  18  20  6  16  IS  32  19  15  .  20  26  -3  11 
251.75  32.78  22  17  7  43  15  18  19  15  20  .  13-12  7 

2.28  0.76  IS  11  4  13  11  27  16  15  26  13  .  1  -3 

O.n  0.70-28-22-27-26-19-16-17-28  -3-12  I  .-39 
0.01  0.47  21  19  26  16  17  8  14  22  11  7  -3-39  . 

31.58  4.63  15  14  6  9  12  11  17  14  8  -1  4  -4  4 


14  16  11 
8  16  6 
-1  6  4 
6  5 
-0  -8 
4  6 
18  12 


4 

-4 

4 


8  4  1  12  10 

10  10  1  10  13 

5  7  -3  11  10 

7  5  2  0  3 

8  12  2  14  16 

14  19  7  2  13 

6  10  17  11 
10  -0  16  18  17 

3  11-11  2  -1 
2  4-6  1-3 

4  6  -3  7  3 
-3  2  -8  -8  -5 

1  -3  15  15  16 
6  II  13  11  10 


8  4 
/  / 

6  a 

-2  -0 

9  5 
6  4 


15  1C  18  13 
12  12  15  IS 
-3  3  5 
7  17  11 


14 


6  19 

•  t 


11 

1 


a 
10 
9 

-2-12 

-9  -1 

0  0 

6  2 

t  c 


9 

19 

te 


10 


6 
9 
a 

10 

-4 
-)  -2 

-6  -3 

•• 

/ 

•5 


4  9 
-3  2 
-2  -I 

5  -? 
1  2 

IC  a 


-e 

_e 

te' 


-2 

* 

-0 
■ » 
4 
6 
-1 


50.04  10.28 

18  21 

9  12  19  22  19  16  16  6 

6  -0 

4 

18  .  20  21 

18  18  34  26  21 

17 

5 

*te 

*7 

1*9 

15  -7 

31.76 

5.16 

8  10 

5 

7 

8 

14  14  11  6  4 

5  -8 

6 

12  20  .  9 

15  10  20  21  21 

9 

15 

1  ? 

•  a 

IS 

9 

10  -6 

10.57 

2.17 

4  10 

7 

5  12 

19  14  10  8  2 

4  -3 

1 

621  9  . 

31  14  21  13  30 

14 

/ 

e 

*99 

16 

•  i 

11-10 

10.56 

1.63 

1  1 

-3 

4i 

m 

4 

7  6  -0  11  4 

6  2 

-3 

11  18  15  31 

.  1  4  3  19 

16 

«  • 

*• 

•te 

a 

•  m 

•  te 

10  -1 

33.44 

5.90 

12  10 

11 

0  14 

2  10  16-11  -6 

-3  -8  15 

13  18  10  14 

1  .  60  53  35 

IS 

•» 

40 

mm 

23 

m  » 

•  <  • 

50.11 

9.99 

10  13 

10 

3  16 

13  17  18  2  1 

7  -8  15 

11  34  20  21 

4  60  .  60  51 

32 

mm 

49  46  35 

A*  • 

te*  • 

25.52 

4.55 

8  7 

6 

-2 

9 

6  11  17  -1  -3 

3  -5  16 

10  26  21  13 

8  53  60  .  40 

24 

20 

36  37  33 

mm 

23  0 

13.54 

4.62 

4  7 

6  -0 

5 

4  7  11  7-3 

2  -2  10 

7  21  21  30 

19  35  51  40  . 

26 

18 

r? 

33 

31 

36 

24  -3 

2.03 

1.19 

9  15  12 

3 

9 

16  15  14  -0  -2  -9  0 

6 

3  17  12  14 

16  18  S  24  26 

t 

15  IS  23 

23 

20 

17  -4 

6.88 

2.29 

8  10  12 

-3 

7 

7  6  10  9-12 

-1  0 

2 

5  5  9  7 

2  15  22  20  18  15 

• 

^9 

te* 

20  21 

17 

« A  A 

1*  -2 

40.20 

7.04 

19  18  15 

3  17 

9  19  19  4  -2 

1  -7 

0 

14  12  15  9 

11  40  38  36  22 

18 

m4 

a 

49 

49  33 

37  2 

17.85 

3.36 

8  13 

18 

5 

11 

12  15  15  4  -3  -0  -6  10 

7  27  17  21 

12  33  49  37  33 

mm 

20 

i9 

• 

60  40 

44  -1 

14.45 

3.35 

7  12 

14 

7 

* 

12 

6  9  9  10  -4 

-2  -3 

7 

3  23  lo  16 

9  28  46  33  31 

mti 

te* 

49 

60 

• 

AM 

-T 

40  -a 

3.12 

1.23 

6  8 

6 

-1 

5 

5  9  6  4  -3 

-2  5 

1 

10  12  9  17 

13  24  35  27  36  20  17  33 

40 

39 

• 

32  -0 

6.02 

1.90 

8  9 

10 

1 

5 

4  8  13  9  2 

-1  -7 

2 

6  15  10  11 

10  19  31  23  24 

17 

12 

37 

•  • 

40 

32 

•  • 

0.29 

0.51 

-6  -5 

-3 

-7 

-2 

-2  1  0-0-5 

4  6 

-1 

-0  -7  -6-10 

-114  0-3 

•4 

.7 

m 

m 

•i 

-6  -0 

• 

•  B 

184 


STnoaoty  of  bssults 


FROM  THE  WITHIH-MOS  PRZHCZPHL  COKPOIIEIIT  AMXLTSES 
OF  THE  PESFORMANCE  FACTOR  SCORES 


•  Emergence  of  method  factors 
—  Written  knowledge  tests 
—  Rating  scales 


•  Correspondence  between  Army-wide  BARS  euid  administrative  measures 
factors 

—  Effort  and  Leadership  from  the  Army-wide  BARS 

and 

Letters  and  Certificates  from  the  administrative 
measures 


Personal  Discipline  from  the  Army-wide  BARS 

and 

Articles  15/Flag  Actions  and  Promotion  Rate  Deviation 
Score  from  the  administrative  measures 

Physical  Fitness  and  Military  Bearing  from  the  Army-wide  BARS 

auid 

Physical  Readiness  Test  Score  from  the  administrative 
measures 


e  Lack  of  correspqndence  between  performance  test  scores  ( i . e . , 
measures  of  "maximal”  performance)  and  performance  ratings  (i.e., 
measures  of  "typical"  performance) 


e  Distinction  between  technical  (MOS-speclfic)  factor  and  remaining 
general  soldiering  factors 


e  Lack  of  distinction  zunong  different  general  soldiering  factors 
(i.e..  Communications,  Vehicle  Operation  and  Maintenance,  etc.) 


1S5 


LATENT  CONSTRUCTS 


UNDERLYING  THE  PROJECT  A  ENLISTED  PERFORMANCE  MEASURES 

•  "Content”  constructs 

~  Core  Technical  Proficiency 

General  Soldiering  Proficiency 
—  Effort  and  Leadership 
Personal  Discipline 

—  Physical  Fitness  emd  Military  Bearing 

e  "Method”  constructs 

Written  knowledge  tests 
Rating  scales 


186 


DSrXlfZTIONS  OF  THE  PERFORMANCE  CONSTRUCTS 


•  Core  Technical  Proficiency 

This  performance  construct  represents  the  proficiency 
with  which  the  soldier  performs  the  tasks  that  are 
** central"  to  the  MOS.  The  tasks  represent  the  core  of 
the  job  and  they  are  the  primary  definers  of  the  MOS. 

For  exeunple,  the  first  tour  Armor  Crewman  starts  and 
stops  the  tank  engines;  prepares  the  loader's  station; 
loads  and  vmloads  the  main  g\in;  boresights  the  M60A3 ; 
engages  targets  with  the  main  gun;  amd  performs  misfire 
procedures.  This  performemce  construct  does  not  include 
the  individual's  willingness  to  perform  the  task  or  the 
degree  to  which  the  individual  can  coordinate  efforts 
with  others.  It  refers  to  how  well  the  individual  can 
execute  the  core  technical  tasks  the  job  requires,  given 
a  willingness  to  do  so. 


General  Soldiering  Proficiency 

In  addition  to  the  core  technical  content  specific  to  an 
MOS,  individuals  in  every  MOS  also  are  responsible  for 

- being  able  to  perform  a  variety  of  general  soldiering 

— tssks  —  for  example,  determines  grid  coordinates  on 
military  maps;  puts  on,  wears  and  removes  M17  series 
protective  mask  with  hood;  determines  a  magnetic  azimuth 
using  a  compass;  collects/reports  information  -  SALUTE; 
and  recognizes  and  identifies  friendly  and  threat 
aircraft.  Performance  on  this  construct  represents 
overall  proficiency  on  these  general  soldiering  tasks. 
Again,  it  refers  to  how  well  the  Individual  can  execute 
general  soldering  tasks,  given  a  willingness  to  do  so. 


Effort  and  Leadership 

This  performance  construct  reflects  the  degree  to  which 
the  individual  exerts  effort  over  the  full  range  of  job 
tasks,  perseveres  under  adverse  or  dangerous  conditions, 
and  demonstrates  leadership  and  support  toward  peers. 

That  is,  can  the  individual  be  coxinted  on  to  carry  out 
assigned  tasks,  even  under  adverse  conditions,  to  exercise 
good  judgment,  and  to  be  generally  dependable  and 
proficient?  While  appropriate  knowledges  and  skills  are 
necessary  for  successful  performance,  this  construct  is 
only  meant  to  reflect  the  Individual's  willingness  to  do 
the  job  required  and  to  be  cooperative  and  supportive 
with  other  soldiers. 


187 


This  performance  construct  reflects  the  degree  to  which 
the  individual  adheres  to  Army  regulations  and  traditions, 
exercises  personal  self ^control ,  demonstrates  integrity 
in  day-to-day  behavior,  and  does  not  create  disciplinary 
problems.  People  who  rank  high  on  this  construct  show  a 
commitment  to  high  standards  of  personal  conduct. 


This  performance  construct  represents  the  degree  to 
which  the  individual  maintains  an  appropriate  military 
appearance  and  bearing  and  stays  in  good  physical 
condition. 


188 


MPPING  Of  KRfOMtAMCE  rMTOtS  OHIO  UIEMT  PERFOMMNCE  CONSfRUCIS 


139 


Rou:  Within  Mch  rMln*  InstnMcnt,  all  of  tha  factora  Mtra  conalralnad  to  hava  an  acfjal  loading  on  tha  Rating  Scales  acthod  construct,  for  asMpIc. 
the  Rarfora  Walt  and  Avoid  Mistakes  factors  from  the  Coafaat  Perforsance  Prediction  Scales  were  constrained  to  have  Identical  loadings  on  the 
Rating  Scales  aethod  construct,  but  this  loading  did  not  have  to  be  the  sasa  as  the  loading  for  the  Aray  Wide  RARS  factors,  the  HOS-Specific 


UHZQUEliESS  ESTZMXTSS 
SEPASATE  MODEL  POE  EACH  JOB 


Military  Occupational  Specialty 


Factor  Score 

IIB 

13B 

19E 

31C 

63B 

64C 

71L 

91A 

95B 

HO  Tech 

.52 

.71 

.48 

.64 

.74 

.33 

.57 

.88 

HO  Soldier 

.59 

.66 

.75 

.52 

.95 

.74 

.55 

.76 

.63 

HO  Safety 

.92 

.85 

.75 

.52 

.95 

.59 

.79 

.71 

.77 

HO  Comm 

.95 

.95 

.81 

.62 

— 

.82 

HO  Vehicle 

— 

— 

— 

.03 

.95 

** 

-- 

“ 

.90 

JK  Tech 

.21 

.30 

.15 

.12 

.39 

.17 

.11 

.53 

JK  Soldier 

.10 

.43 

.22 

.26 

.29 

.74 

.31 

.58 

.43 

JK  Safety 

.32 

.53 

.32 

.31 

.45 

.49 

.44 

.15 

.57 

JK  Comm 

.56 

.93 

.32 

.34 

— 

.64 

JK  Vehicle 

— 

.56 

.32 

** 

•• 

.94 

.82 

JK  Identify 

.36 

.89 

.40 

.51 

— 

.95 

.92 

.90 

SK  Tech 

.27 

.13 

.09 

.10 

.14 

.14 

.15 

.52 

SK  Soldier 

.09 

.37 

.14 

.48 

.31 

.42 

.54 

.74 

.46 

SK  Safety 

.46 

.59 

.43 

.41 

.50 

.55 

.72 

.47 

.55 

SK  Comm 

.40 

.72 

.35 

.65 

.82 

.78 

.67 

SK  Vehicle 

.73 

.62 

.69 

.55 

.18 

** 

.73 

.76 

.75 

SK  Identify 

— 

.45 

.10 

.22 

.25 

.25 

.34 

.10 

.13 

Overall  Rating 

.13 

.13 

.13 

.13 

.13 

.13 

.13 

.13 

.18 

Eff/Ldr  Rating 

.11 

.11 

.11 

.11 

.11 

.05 

.11 

.11 

.05 

Discpln  Rating 

.22 

.22 

.22 

.22 

.22 

.05 

.22 

.22 

.06 

Fitness  Rating 

.38 

.38 

.38 

.38 

.38 

.05 

.38 

.38 

.05 

MOS  Tech  Rtngs 

.08 

.11 

.13 

.14 

.08 

.37 

.17 

.12 

.33 

MOS  Other  Rtng 

.10 

.13 

.17 

.19 

.12 

.35 

.20 

.18 

.27 

Comb  Exmplry 

.02 

.02 

.02 

.02 

.02 

.14 

.02 

.02 

.08 

Comb  Problems 

.13 

.13 

.13 

.13 

.13 

.60 

.13 

.13 

.40 

Awards/Cert 

.89 

.94 

.93 

.95 

.91 

.94 

.86 

.85 

.90 

Phys  Readiness 

.95 

.33 

.67 

.34 

.50 

.83 

.46 

.49 

.49 

Articles  15 

.58 

.59 

.68 

.60 

.56 

.76 

.51 

.75 

.64 

Promotion  Rate 

.45 

.60 

.53 

.41 

.57 

.64 

.62 

.67 

.70 

M16 

.50 

.50 

.50 

.50 

.50 

.50 

.50 

.50 

.50 

**  Vehicle  content  was  merged  into  the  Technical  factor  for  64C. 


ISO 


OOODMBS8-OF-FZT  ZHDZCES 


8EFABXTB  MODEL  FOR  EACH  JOB 


MOS 

Root  Mean 
Square  Residual 

Chi-Square 

df 

P 

IIB: 

Infantryman 

.061 

326.2 

227 

.02 

I3B: 

Cannon  Cra%naan 

.057 

350.0 

322 

.14 

19E: 

Tank  Crewman 

.065 

170.0 

348 

.999 

3 1C: 

Radlo/Taletype 

Operator 

.069 

369.2 

375 

CO 

in 

• 

63B: 

Vehicle/Generator 

Mechanic 

.060 

332.1 

296 

.07 

64C: 

Motor  Transport 
Operator 

.058 

280.1 

247 

.07 

71L: 

Administrative 

Clerk 

.067 

232.6 

249 

.77 

91A: 

Medical  Specialist 

.061 

277.1 

275 

.45 

95B: 

Military  Police 

.052 

470.0 

374 

.001 

191 


nkCTOR  L0ADZMG8 


SEPAUkTS  MODEL  FOR  EACH  JOB 


Military  Occupational  Specialty 


Construct/Factor 

IIB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

Core  Technical 

HO  Tech 

— 

.61 

.47 

.64 

.51 

.29 

.77 

.59 

.32 

JK  Tech 

— - 

.75 

.78 

.79 

.74 

.26 

.78 

.75 

.32 

SK  Tech 

— 

.70 

.79 

.73 

.82 

.55 

.229 

.81 

.43 

MOS  Tech  Rtng 

~ 

.45 

.10 

.22 

.25 

.25 

.34 

.10 

.13 

General  Soldiering 

HO  Soldier 

.60 

.51 

.46 

.64 

.17 

.50 

.60 

.42 

.60 

HO  Safety 

.26 

.33 

.32 

.31 

.12 

.63 

.37 

.48 

.47 

HO  Coma 

.05 

.06 

.39 

.56 

.. 

... 

~ 

.80 

HO  Vehicle 

~ 

~ 

~ 

.22 

.17 

** 

— 

— 

.31 

JK  Soldier 

.76 

.52 

.74 

.62 

.45 

.48 

.87 

.58 

.46 

JK  Safety  > 

.55 

.37 

.75 

.38 

.71 

.51 

.72 

.58 

.33 

JK  Conn 

.30 

.23 

.66 

.38 

~ 

~ 

~ 

-- 

.29 

JK  Vehicle 

— 

.17 

.10 

.41 

** 

-- 

.35 

JK  Identify 

.46 

~ 

.20 

.28 

— 

.12 

~ 

.24 

.21 

SK  Soldier 

.73 

.45 

.67 

.39 

.78 

.56 

.45 

.44 

.42 

SK  Safety 

.47 

.32 

.53 

.62 

.57 

.47 

.30 

.64 

.32 

SK  Cornu 

.42 

.26 

.42 

-- 

.41 

.35 

.20 

— 

.20 

SK  Vehicle 

.22 

.24 

.05 

.30 

.61 

** 

.22 

.47 

.28 

SK  Identify 

.46 

~ 

.46 

.13 

— — 

— 

— 

— 

~ 

Effort/Leadership 

Eff/Ldr  Rating 

.76 

.56 

.85 

.64 

.68 

.83 

.66 

.76 

.70 

MOS  Tech  Rtngs 

.70 

— 

.63 

.40 

.41 

.50 

.25 

.59 

.52 

MOS  Other  Rtng 

.77 

.41 

.48 

.43 

.54 

.62 

.43 

.61 

.56 

CoBb  Exaplry 

.80 

.47 

.68 

.54 

.57 

.87 

.63 

.80 

.77 

Conb  Probleas 

.48 

.20 

-- 

.39 

.52 

.53 

.55 

-- 

.56 

Awards/Cert 

.32 

.23 

.24 

.19 

.28 

.25 

.34 

.34 

.22 

Overall  Rating 

.46 

.39 

.33 

.17 

.57 

.42 

.65 

.41 

192 


nkCTOR  LORDiaas 


8BSRSRTB  MODEL  FOE  BECH  JOB 
(eoBtiauad) 


Military  Occupational  Specialty 


Construct/Factor 

IIB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

Discipline 

Discpln  Rtng 

.77 

.58 

.73 

.45 

.63 

.85 

.74 

.58 

.73 

Comb  Probleas 

.29 

.16 

.62 

.03 

.05 

.19 

-- 

.82 

.33 

Articles  15  -.63 

-.61 

-.55 

-.62 

-.65 

-.47 

-.69  -.46 

-.60 

Promotion  Rate 

.74 

.61 

.68 

.79 

.63 

.57 

.59 

.54 

.54 

Overall  Rating 

.39 

.20 

.53 

.54 

.09 

.42 

.06 

.75 

.38 

Fitness/Bearing 

Fitness  Ratngs 

.69 

.23 

.84 

.48 

.54 

.42 

.50 

.60 

.78 

Phys  Readiness 

.11 

.90 

.49 

.89 

.70 

.53 

.76 

.69 

.69 

Ratings  lii^Od 

AH  Ratings 

.60 

.73 

.47 

.70 

.66 

.54 

.65 

.66 

.66 

MOS  Ratings 

.73 

.73 

.60 

.69 

.67 

.49 

.69 

.54 

.63 

Comb  Ratings 

.47 

.65 

.55 

.69 

.57 

.27 

.55 

.47 

.40 

Written  Method 

JK  Tech 

~ 

.47 

.28 

.55 

.59 

.73 

.44 

.58 

.57 

JK  Soldier 

.41 

.51 

.33 

.40 

.61 

.57 

.11 

.37 

.59 

JR  Safety 

.37 

.52 

.12 

.63 

.08 

.49 

.17 

.76 

.57 

JK  Comm 

.34 

.11 

.07 

.55 

~ 

~ 

~ 

— - 

.52 

JK  Vehicle 

— 

~ 

— 

.42 

.62 

** 

-- 

.24 

.21 

JK  Identify  -.15 

.23 

.50 

.36 

— 

.05 

~ 

.08 

.23 

SK  Tech 

— — 

.48 

.48 

.55 

.46 

.88 

.42 

.27 

.50 

SK  Soldier 

.50 

.66 

.54 

.59 

.15 

.51 

.54 

~ 

.54 

SK  Safety 

.53 

.55 

.42 

.29 

.34 

.48 

.44 

.19 

.60 

SK  Comm 

.51 

.47 

.46 

— 

.16 

.24 

.05 

-- 

.42 

SK  Vehicle 

.49 

.57 

.24 

.48 

.55 

** 

.38 

.05 

.42 

SK  Identify 

.21 

— 

.42 

.44 

— — 

— — 

“ 

~ 

— 

M16  Qualification 

.71 

.71 

.71 

.71 

.71 

.71 

.71 

.71 

.71 

**  Vehicle  content 

was 

merged 

into 

the  Core  Technical 

factor 

for 

64C. 

193 


I8TZ1IXTED  COMSTSnCT  CORSEZJkTZOHS 
SBPJJtATE  MODEL  FOR  EACH  JOB 


Military  Occupational  Specialty 


1st  Construct 

2nd  Construct 

IIB 

13B 

19E 

31C 

63B 

64C 

71L 

91A 

95B 

Core 

Gen  Soldiering 

.77 

.83 

.63 

.58 

.73 

.48 

.66 

.70 

Technical 

Effort/Lead 

.67 

.86 

.51 

.44 

.50 

.78 

.44 

.35 

.46 

Discipline 

.42 

.13 

.37 

.26 

.12 

.69 

.19 

.43 

.50 

Fitness 

.25 

.01 

.03 

.04 

-.18 

-.09 

.10 

-.05 

-.09 

M16 

.27 

.00 

.04 

.11 

.05 

.05 

-.09 

-.17 

-.10 

General 

Effort/Lead 

— 

.89 

.58 

.57 

.53 

.44 

.37 

.43 

.40 

Soldiering 

Discipline 

-- 

.29 

.45 

.30 

.29 

.29 

.04 

.37 

.24 

Fitness 

— 

-.19 

.05 

-.05 

-.03 

-.14 

.09 

-.05 

.00 

K16 

mm  ■> 

-.06 

.30 

.30 

.04 

.11 

.27 

.02 

.02 

Effort/ 

Discipline 

.49 

.67 

.62 

.55 

.65 

.51 

.51 

.59 

.39 

Leadership 

Fitness 

.57 

.04 

.38 

-.11 

.10 

.23 

.32 

.21 

.42 

M16 

.38 

-.13 

.21 

.24 

-.02 

.35 

.22 

.17 

.28 

Discipline 

Fitness 

.33 

.05 

.24 

.24 

.30 

.30 

.27 

.19 

.25 

M16 

-.12 

-.25 

-.30 

.09 

-.28 

-.11 

.01 

-.28 

-.08 

Fitness 

M16 

• 

ill 

M 

.26 

-.05 

.02 

.19 

.22 

.18 

.27 

.26 

194 


TBSTING  THE  LXTEIIT  8THDCTUSE  MODEL 
ACROSS  ALL  HIKE  MOS  SIMULTAHEOUSLY: 
ASSUMPTIONS 


Znt«rcorr«latlons  anong  the  performance  constructs 
are  the  same  for  all  MOS 


The  loadings  of  the  Army-vlde  factors  (i.e.,  the 
Army-wide  BARS  factors,  the  combat  factors,  and  the 
administrative  measxires  "factors")  on  the  content 
and  method  constructs  are  constant  across  MOS 


No  Ml 6  factor  or  construct 


OHZQOBnSS  B8TZ1ATBS 
aXMOLS  MODEL  MCBOSS  ALL  JOBS 


Military  Occupational  Specialty 


Factor  Score 

IIB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

HO  Tech 

.62 

.79 

.62 

.76 

.91 

.44 

.68 

.90 

HO  Soldier 

.72 

.58 

.80 

.70 

.95 

.73 

.64 

.87 

.67 

HO  Safety 

.95 

.84 

.90 

.87 

.95 

.73 

.90 

.75 

.81 

HO  Comm 

.95 

.95 

.86 

.71 

~ 

~ 

-- 

~ 

.82 

HO  Vehicle 

— 

“ 

.95 

.95 

** 

— — 

~ 

.93 

JK  Tech 

.23 

.28 

.13 

.15 

.32 

.28 

.16 

.60 

JK  Soldier 

.10 

.44 

.28 

.40 

.48 

.41 

.44 

.47 

.40 

JK  Safety 

.48 

.56 

.41 

.49 

.62 

.44 

.55 

.26 

.54 

JK  Comm 

.85 

.91 

.57 

.55 

— 

•• 

— 

.67 

JK  Vehicle 

— 

.87 

.44 

** 

— 

.95 

.85 

JK  Identify 

.71 

.90 

.84 

.81 

— 

.95 

— 

.64 

.90 

SK  Tech 

.25 

.10 

.24 

.18 

.17 

.27 

.19 

.54 

SK  Soldier 

.13 

.37 

.20 

.52 

.41 

.31 

.58 

.83 

.49 

SK  Safety 

.54 

.62 

.54 

.51 

.55 

.51 

.80 

.29 

.54 

SK  Comm 

.46 

.75 

.48 

— 

.77 

.78 

.92 

-- 

.70 

SK  Vehicle 

.75 

.68 

.95 

.61 

.31 

** 

.86 

.86 

.75 

Overall  Rating* ** 

.18 

.18 

.18 

.18 

.18 

.18 

.18 

.18 

.18 

Eff/Ldr  Rating* 

.09 

.09 

.09 

.09 

.09 

.09 

.09 

.09 

.09 

Dlscpln  Rating* 

.17 

.17 

.17 

.17 

.17 

.17 

.17 

.17 

.17 

Fitness  Rating* 

.05 

.05 

.05 

.05 

.05 

.05 

.05 

.05 

.05 

MOS  Tech  Rtngs* 

.18 

.34 

.22 

.24 

.18 

.18 

.18 

.18 

.25 

MOS  Other  Rtng* 

.05 

.24 

.46 

.37 

.05 

.05 

.05 

.05 

.27 

CoiBb  Exmplry* 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

Comb  Problems* 

.29 

.29 

.29 

.29 

.29 

.29 

.29 

.29 

.29 

Awards/Cert* 

.93 

.93 

.93 

.93 

.93 

.93 

.93 

.93 

.93 

Phys  Readiness* 

.83 

.83 

.83 

.83 

.83 

.83 

.83 

.83 

.83 

Articles  15* 

.77 

.77 

.77 

.77 

.77 

.77 

.77 

.77 

.77 

Promotion  Rate* 

.70 

.70 

.70 

.70 

.70 

.70 

.70 

.70 

.70 

*  These  loadings  were  constrained  to  be  equal  across  all  MOS. 

**  Vehicle  content  was  merged  Into  the  Core  Technical  factor  for  64C. 


196 


GOODNESS  OF  FIT  INDICES 
SINGLE  MODEL  ACROSS  ALL  JOBS 

•  Chi-square  >  2508.1 

df  -  2403 
p  -  .07 

#  Root  Mean  Square  Residual  «  .047 


197 


rXCTOR  LOAOZMQS 


SZVGLE  MODEL  ACROSS  ALL  JOBS 


Military  Occupational  Specialty 


Construct/Factor 

IIB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

Core  Technical 

HO  Tech 

~ 

.59 

.43 

.58 

.46 

.27 

.71 

.54 

.29 

JK  Tech 

— 

.71 

.79 

.76 

.57 

.72 

.70 

.74 

.37 

SK  Tech 

-- 

.66 

.70 

.54 

.73 

.55 

.68 

.85 

.42 

MOS  Tech  Rtng 

— — 

.21 

.12 

.16 

.25 

.01 

.12 

.05 

-.02 

General  Soldiering 

HO  Soldier 

.52 

.66 

.44 

.52 

.16 

.51 

.57 

.35 

.58 

HO  Safety 

.20 

.44 

.31 

.36 

.10 

.49 

.30 

.50 

.41 

HO  Coma 

.06 

.12 

.37 

.52 

— 

— 

— 

— 

.43 

HO  Vehicle. 

— 

— 

~ 

.15 

.21 

** 

— 

.27 

JK  Soldier 

.95 

.50 

.79 

.64 

.42 

.69 

.66 

.69 

.49 

JK  Safety 

.69 

.36 

.75 

.45 

.53 

.66 

.57 

.65 

.42 

JK  Comn 

.35 

.25 

.59 

.51 

... 

•• 

.39 

JK  Vehicle 

-- 

— 

-- 

.28 

.37 

** 

.. 

.07 

.34 

JK  Identify 

.43 

.21 

.34 

.36 

— 

.12 

-- 

.39 

.18 

SK  Soldier 

.81 

.40 

.67 

.33 

.70 

.50 

.42 

.40 

.38 

SK  Safety 

.57 

.34 

.45 

.40 

.63 

.43 

.31 

.62 

.34 

SK  Comm 

.51 

.21 

.31 

~ 

.42 

.29 

.17 

— 

.23 

SK  Vehicle 

.35 

.22 

.06 

.17 

.65 

** 

.32 

.36 

.21 

Effort/Leadership 

Eff/Ldr  Rating* 

.76 

.76 

.76 

.76 

.76 

.76 

.76 

.76 

.76 

MOS  Tech  Rtngs* 

.59 

.33 

.54 

.50 

.45 

.62 

.43 

.62 

.62 

MOS  Other  Rtng* 

.77 

.59 

.33 

.45 

.59 

.48 

.47 

.58 

.58 

Comb  Exaplry* 

.72 

.72 

.72 

.72 

.72 

.72 

.72 

.72 

.72 

Comb  Problem* 

.44 

.44 

.44 

.44 

.44 

.44 

.44 

.44 

.44 

Avards/Cert* 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

Overall  Rating* 

.48 

.48 

.48 

.48 

.48 

.48 

.48 

.48 

.48 

198 


nCTOR  LOROZHGS 


SZiraLE  MODEL  ACB088  ELL  J0B8 
(eontiauAd) 


Military  Occupational  Specialty 


Construct/Factor 

IIB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

Discipline 

Discpln  Rtng* ** 

.69 

.69 

.69 

.69 

.69 

.69 

.69 

.69 

.69 

Comb  ProblfUfts* 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

Articles  15* 

-.48 

-.48 

-.48 

-.48 

-.48 

-.48 

-.48 

-.48 

-.48 

Promotion  Rate* 

.52 

.52 

.52 

.52 

.52 

.52 

.52 

.52 

.52 

Overall  Rating* 

.28 

.28 

.28 

.28 

.28 

.28 

.28 

.28 

.28 

Fitness/Bearing 

Fitness  Ratngs* 

.82 

.82 

.82 

.82 

.82 

.82 

.82 

.82 

.82 

Phys  Readiness* 

.37 

.37 

.37 

.37 

.37 

.37 

.37 

.37 

.37 

Ratings  Method 

AH  Ratings* 

.56 

.56 

.56 

.56 

.56 

.56 

.56 

.56 

.56 

MOS  Ratings* 

.61 

.61 

.61 

.61 

.61 

.61 

.61 

.61 

.61 

Comb  Ratings* 

.42 

.42 

.42 

.42 

.42 

.42 

.42 

.42 

.42 

Written  Method 

JK  Tech 

~ 

.49 

.29 

.54 

.71 

.30 

.42 

.49 

.49 

JK  Soldier 

-.16 

.51 

.29 

.40 

.53 

.25 

.28 

.60 

.60 

JK  Safety 

-.07 

.49 

.07 

.52 

.26 

.28 

.35 

.52 

.52 

JK  Comm 

.00 

.11 

.19 

.38 

— 

~ 

-- 

.41 

.41 

JK  Vehicle 

~ 

~ 

~ 

.19 

.62 

** 

.20 

.20 

JK  Identify 

-.05 

.20 

.12 

.17 

— 

.10 

~ 

.25 

.25 

SK  Tech 

~ 

.54 

.65 

.64 

.49 

.71 

.45 

.53 

.53 

SK  Soldier 

.44 

.68 

.58 

.61 

.25 

.66 

.50 

.60 

.60 

SK  Safety 

.34 

.51 

.49 

.57 

.18 

.56 

.30 

.59 

.59 

SK  Comm 

.51 

.46 

.60 

— 

.20 

.36 

.20 

.50 

.50 

SK  Vehicle 

.38 

.51 

.17 

.60 

.45 

** 

.17 

.46 

.46 

*  These  loadings  were  constrained  to  be  ecpial  across  all  MOS. 

**  Vehicle  content  was  merged  into  the  Core  Technical  factor  for  64C. 


199 


ESTIMATED  PEREOSKAMCE  COMSTRUCT  CORRELATIONS 
SINGLE  MODEL  ACROSS  ALL  JOBS 


First  Construct 

Second  Construct 

Correlation 

Core  Technical 

General  Soldiering 

.80 

Effort/Leadership 

.48 

Discipline 

.35 

Fitness/Bearing 

.01 

General  Soldiering 

Effort  Leadership 

.47 

Discipline 

.35 

Fitness/ Bearing 

.06 

Effort/Leadership 

Discipline 

.67 

Fitness/Bearing 

.42 

Discipline 

Fitness/Bearing 

.40 

200 


SCORING  THE  CONSTRUCTS: 
PRELIMINARY  DECISIONS 


•  '^Rational"  vs.  regression  weights 


e  Regression  weights  were  not  used  because: 
—  They  would  be  difficult  to  explain 
—  We  were  concerned  about  their  stability 


e  Mapping  of  factors  onto  constructs 

Each  factor  was  assigned  to  one 
construct.  If  a  factor  was  assigned 
to  two  constructs  in  the  latent 
structure  nodel,  for  scoring  purposes 
it  was  assigned  to  the  construct  on 
which  it  had  the  highest  loading. 


e  Unit  weights  by  measurement  method 

As  em  intermediate  step  in  computing 
construct  scores,  we  first  computed 
construct  subscores  by  combining  all 
of  the  factors  from  a  given  measurement 
method.  We  then  summed  these  sxibscores 
to  compute  the  total  construct  score. 


201 


COHSTRUCT  8UBSCOSE8 


Cor«  Technical  Proficiency 

--  Hemde-On  Subecore:  Average  percent  GO  across  all  Core 
Technical  tasks 

—  Knowledge  Subscore:  S:m  of  job  and  school  knowledge  Core 
Technical  items  answered  correctly 


General  Soldiering  Proficiency 

—  Hands-On  Stibscore:  Average  percent  GO  across  all  General 
Soldiering  tasks 

—  Knowledge  Sxibscore:  Sum  of  job  knowledge  and  school 
knowledge  General  Soldiering  items  answered  correctly 


Effort  and  Leadership 

--  Overall  Effectiveness  Subscore:  Overall  Effectiveness 
rating  from  the  Army-wide  BARS 

—  BARS  Sxibscore:  Siam  of  (1)  the  Effort  emd  Leadership 
factor  from  the  Army-wide  BARS,  and  (2)  the  Core  factor 
and  (3)  the  Other  factor  from  the  MOS-specific  BARS 

—  Combat  Subscore:  Average  rating  across  all  of  the  items 
from  the  Combat  Performance  Prediction  Scales 

—  Administrative  Measures  Subscore:  Letters  emd 
Certificates  factor  score  from  the  administrative 
measures 


Personal  Discipline 

—  Army-wide  BARS  Subscore:  Personal  Discipline  factor 
score  from  the  Army-wide  BARS 

—  Administrative  Measures  Subscore:  Sum  of  (1)  the 
Articles  15/Flag  Actions  and  (2)  the  Promotion  Rate 
Deviation  factor  scores  from  the  administrative  measures 


Physical  Fitness  and  Military  Bearing 

—  Army-wide  BARS  Subscore:  Physical  Fitness  and  Military 
Bearing  factor  score  from  the  Army-wide  BARS 

—  Administrative  Measures  Sxibscore:  Physical  Readiness 
Test  factor  score  from  the  administrative  measures 


PROJECT  A  CONCURRENT  VALIDATION: 
TREATMENT  OF  MISSING  DATA 


Lauress  L.  Wise 
Jeffrey  J.  McHenry 
Winnie  Y.  Young 

Aaerican  Institutes  for  Research 


Presented  at  a  Data  Analysis  Workshop  of  the 
Coomittee  on  Performance  of  Military  Personnel 

Baltimore 

December  1986 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


203 


PROJSCT  A  CONCXJRRSMT  7XLZDATZ0N:  TREATMSMT  O?  MZ88ZN6  DATA 

Thm  job  porformanca  data  collected  in  the  Project  A  Concurrent 
Validation  (CV)  are  unprecedented  in  scope.  The  data  cover  19 
diverse  Military  Occupational  Specialties  (MOS)  and  were  collected 
using  an  exhaustive  array  of  different  job  perforaance  neasurenent 
methods.  For  nine  MOS,  designated  Batch  A,  a  complete  set  of 
performance  measures  was  developed  and  administered.  For  the 
remaining  MOS,  designated  Batch  Z,  an  abbreviated  set  of  "Army- 
wide"  measures  was  used. 

The  measures  that  we  used  included: 

e  hands-on  tests  (HO) :  observation  and  scoring  of 
performance  on  15  carefully  sampled  job  tasks 

e  written  tests  of  job  knowledge  (JK) :  tests  of  facts  and 
procedures  for  30  carefully  sampled  job  tasks 

e  written  tests  of  school  knowledge  (SK) :  tests  of  facts 
and  procedures  taught  during  training  for  the  MOS 

e  ratings  of  performance  by  peers  and  supervisors  on  several 
sets  of  rating  scales,  including: 

-  11  Army-wide  Behavior  Summary  Scales  (AHB) 

-  7  to  13  MOS-specific  Behavior  Summary  Scales  (MSB) 

-  15  Job  Task  Rating  Scales  (JTR)  for  Batch  A  MOS 

-  11  Common  Task  Rating  Scales  (CTR)  for  Batch  Z  MOS 

-  40  Combat  Performance  Prediction  Scales  (CPP) 

e  self-report  of  administrative  and  personnel  records 
(AOM) ,  including: 

-  letters  and  commendations 

-  Physical  Readiness  Test  Score 

-  Marksmanship  Score 

-  disciplinary  actions  (Articles  15,  Flag  Actions) 

e  data  from  operational  Army  files  on  promotion  rates  and 
scores  on  Skill  Qualification  Tests. 

Data  on  possible  moderators  of  job  performance  (job  history  and 
work  environment)  and  on  new  predictors  of  job  performance  (a 
half-day  paper-and-pencil  and  computer  test  battery)  also  were 
collected  during  the  CV. 

The  CV  data  collection  procedures  were  subjected  to  extensive 
pilot  and  field  tests.  The  data  collection  teams  were  extensively 
trained  and  were  supervised  by  senior  staff.  The  quality  and 
completeness  of  the  data  collected  attest  to  the  thoroughness  of 
these  procedures.  However,  notwithstanding  our  best  efforts,  the 
final  data  were  to  some  extent  incomplete.  The  purpose  of  this 
paper  is  to  describe  the  amount  of  missing  data  in  the  Project  A 


204 


CV  data  basa,  tha  problaas  poaad  by  incomplata  data,  and  tha 
staps  takan  to  ovarcoma  thosa  problaas. 

Raasona  for  Zaeoaplata  Data 

Figura  1  lists  soaa  of  tha  chiaf  raasons  for  aissing  CV 
data.  Most  of  tha  raasons  ara  salf-axplanatory,  but  a  couple 
axaaplas  involving  tha  hands-on  tasts  aay  halp  to  illustrate  soae 
of  tha  anticipatad  and  unanticipated  probleas  va  ancountered. 

At  Fort  Hood,  Texas,  ve  ware  testing  Araor  Crewaen  when  a 
spring  in  the  breech  block  of  one  of  the  howitzers  failed.  On 
that  particular  occasion,  we  had  arranged  for  a  back-up  howitzer. 
Consec[uently,  we  did  not  lose  any  data. 

During  hands-on  testing  of  Infantryaen  at  Fort  Banning, 
Georgia,  wa  ware  not  so  forttinata.  Tha  afternoon  started  bright 
and  sunny.  Consequently,  wa  decided  to  adainistar  tha  tests  at 
our  priaary  testing  site,  near  a  aaandaring  creak,  rather  than  at 
our  back-up  bad  weather  site.  Tha  weather  in  central  Georgia  is 
notoriously  fic]cla  on  sxmaer  afternoons,  though.  A  short  tiae 
later,  wa  ware  caught  in  a  deluge.  Tha  creak  rose.  Everyone  was 
up  to  their  shins  in  water.  And  oxir  test  adainistrators  were 
scraabling  aadly,  trying  to  protect  their  equipaent  and  score 
sheets  froa  the  driving  rain.  Unfortunately,  one  test 
adainistrator  siaply  was  not  quick  enough.  As  ha  and  tha  hands- 
on  test  site  aanager  watched,  two  scoresheats  began  to  float 
away.  A  final  effort  was  aada  to  retrieve  thaa,  but  before  they 
could  be  reached,  they  ware  sucked  into  the  creak  and  carried 
swiftly  downstreaa.  The  thunderstorm  abated  a  short  while  later. 
However,  valuable  tiae  had  bean  lost,  and  it  was  not  possible  to 
aove  all  of  tha  subjects  through  all  of  tha  test  stations  prior 
to  the  end  of  tha  soldier's  work  day.  As  a  result,  wa  were  left 
with  c[uite  a  bit  of  Biasing  data. 

Two  other  probleas  encountered  during  hands-on  testing  were 
equipment  variation  and  ^core  sheet  errors.  In  most  cases,  we 
were  able  to  make  allowances  for  equipment  variation  by  developing 
parallel  forms  of  a  test.  Often,  this  involved  omitting  certain 
steps  that  wars  unnecessary  for  one  of  the  equipaent  models.  In 
other  cases,  parallel  sets  of  staps  wars  developed.  Wa  tried  to 
make  this  clear  on  our  score  sheets  and  in  scorer  training.  On  a 
few  occasions,  wa  ware  unsuccessful.  For  example,  in  one  hands- 
on  test  for  Radio/Talatypa  Operators,  the  scoring  sheets  for  one 
task  included  some  steps  to  be  scored  for  one  type  of  equipaent 
and  a  different  set  to  be  scored  for  another  type  of  equipment. 

As  a  result,  no  subject  should  have  had  scores  for  all  of  the 
steps.  Yet,  two  cases  had  data  for  both  sets  of  steps,  creating 
a  unique  problem  of  "too  much"  data  rather  than  missing  data.  In 
several  other  instances,  a  scorer  had  trouble  understanding  some 
of  the  directions  on  the  score  sheets  and  left  one  or  more  steps 
unscored. 


205 


Figure  1 

REASONS  WHY  DATA  WERE  SOMETIMES  INCOMPLETE 


HANDS-ON  DATA 

•  Anticlpatttd  Variation  in  Eguipnant 

•  Unanticipatad  Variation  in  Equipment 

e  Soldiers  Not  Available  for  Part  or  All  of  Scheduled  Time 
e  Equipment  Breakdovm  or  Nonavailability 

e  Conditions  Preventing  Testing  of  Some  Soldiers  on  Some  Tasks 
e  Scorer  or  Scoresheet  Errors 

RATING  DATA 

e  No  Suitable  Raters  Available 
e  Soldier  Does  Not  Perform  Some  Kinds  of  Tasks 
e  Rater  Not  Following  Instructions 

KNOWLEDGE  TEST 

e  Soldiers  Not  Available  for  Part  or  All  of  Scheduled  Time 
e  Soldiers  Exceptionally  Slow  in  Taking  Test 
e  Soldiers  Not  Following  Instructions 


206 


Finally,  a  problea  that  plagued  us  throughout  oxir  testing 
was  that  subjects  often  had  other  coonitaents  or  were  called  away 
in  the  aidst  of  tests.  A  sxibject  aight  get  halfway  through  a 
test,  then  have  to  leave  for  a  dentist  appointment  that  had  been 
scheduled  two  or  three  aonths  previously.  These  unavoidable 
absences  doubtless  caused  aore  aissing  data  than  any  other  factor 
listed  in  Figure  1. 


Aaouat  of  Missing  Data 

For  any  given  instrument,  data  aay  be  either  partially 
aissing  (i.e.,  the  soldier  failed  to  complete  some  items  or  steps) 
or  totally  aissing  (i.e.,  the  soldier  was  not  available  for  a 
testing  session) .  Moreover,  if  data  are  partially  aissing,  there 
aay  be  relatively  small  amounts  of  aissing  data  or  relatively 
large  amounts  of  aissing  data. 

Table  1  shows  the  extent  of  aissing  data  for  the  school 
knowledge  (SK)  tests.  There  were  only  a  very  few  instances  (1%) 
where  a  soldier  failed  to  take  the  test  at  all.  There  were  also 
very  few  soldiers  (1%)  with  relatively  large  amounts  of  aissing 
data.  There  were,  however,  a  significant  number  of  cases  (16%) 
with  a  small  number  of  omitted  items. 

Table  1  also  shows  small  differences  between  the  Batch  A  and 
Batch  Z  MOS  in  the  proportion  of  soldiers  not  tested  at  all.  For 
all  but  one  of  the  Batch  A  MOS,  the  percent  not  tested  is  above 
1%,  while  the  percent  not  tested  is  below  1%  for  all  but  one  of 
the  Batch  Z  MOS.  This  difference  is  a  direct  consequence  of  the 
fact  that  all  of  the  Batch  Z  testing  took  place  in  a  single  day 
while  the  Batch  A  testing  required  two  full  days  of  a  soldier's 
time. 


Table  2  shows  the  extent  of  aissing  data  for  the  job  Icnowledge 
(JK)  tests.  (Subjects  in  Batch  Z  MOS  did  not  receive  job  )cnowledge 
nor  hands-on  tests.)  Again,  there  were  very  few  instances  (1%) 
where  soldiers  were  not  tested  at  all.  The  proportions  of  soldiers 
with  relatively  small  (20%)  and  relatively  large  (3%)  amounts  of 
missing  data  are  slightly  higher  than  for  the  SK  tests,  but 
generally  quite  comparable. 


The  extent  of  missing  data  for  the  hands-on  tests  is  shown 
in  Table  3.  The  niunber  of  soldiers  not  tested  was  again  small 
(1.8%).  The  ntimber  of  soldiers  with  at  least  some  missing  data 
was,  in  many  cases,  very  large.  For  the  most  part,  this  was  due 
to  equipment  variation  or  failure. 

Table  4  shows  the  extent  of  missing  data  for  the  rating 
measures.  A  scale  or  instrument  was  considered  present  if  at 
least  one  peer  or  at  least  one  supervisor  provided  a  rating.  With 


207 


TABLE  2 


NUMBER  AND  PERCENT  OP  CASES  WITH  INCOMPLETE 
JK  DATA  FOR  EACH  BATCH  A  MOS 


MOS 

No  Missing 

<  10H  Miss 

>10H  Miss 

All  Miss 

ToUi 

11B 

S06 

180 

7 

9 

702 

72.08 

25.64 

1.00 

1.28 

13B 

460 

180 

17 

10 

667 

68.97 

26.99 

2.55 

1.50 

19E 

360 

115 

30 

8 

503 

69.58 

22.86 

5.96 

1.59 

31C 

304 

24 

31 

7 

366 

83.06 

6.56 

8.47 

1.91 

63B 

481 

120 

26 

10 

637 

75.51 

18.84 

4.08 

1.57 

64C 

533 

141 

5 

7 

77.70 

20.55 

0.73 

1.02 

71 L 

395 

107 

6 

6 

514 

76.85 

20.82 

1.17 

1.17 

91A 

428 

59 

9 

5 

501 

85.43 

11.78 

1.80 

1.00 

95B 

595 

74 

21 

2 

692 

85.98 

10.69 

3.03 

0.29 

TOTAL 

4052 

1000 

152 

64 

5266 

209 


TASLZ  3 

NUMBER  7LMD  PERCENT  OF  CASES  WITH  ZNCOMFLETE 
HANDS-ON  DATA  FOR  EACH  BATCH  A  MOS 


TOTAL  1898  2865  612  93  5268 


210 


thm  •xcsption  of  tho  Job  Task  Ratings  (JTR) ,  tha  coapletion  rates 
were  all  quite  high.  The  JTR  scales  provided  a  "cannot  rate" 
option  that  was  counted  as  aissing,  and  this  accounts  for  aost 
instances  of  partially  aissing  data.  Tables  S  and  6  show  the 
saae  inforaatlon  for  supervisors  and  peers  alone.  The  percent  of 
soldiers  with  no  ratings  was  quite  a  bit  higher  (8.4%)  because  no 
appropriate  peer  or  supervisor  was  available  in  aany  instances. 

Table  7  shows  the  aaount  of  aissing  JTR  (Batch  A  MOS)  and 
CTR  (Batch  Z  MOS)  data  by  MGS.  Thera  was  considerable  variation 
across  MOS.  For  sons  MOS  (e.g..  Combat  Engineers,  MAKPAOS  Crewman) 
there  were  very  high  levels  of  completeness.  However,  for  MOS 
where  soldiers  tend  to  work  in  isolation  from  other  soldiers  in 
their  MOS  and  tend  to  perform  only  a  subset  of  the  tasks  raoed, 
the  incidence  of  aissing  data  was  significantly  higher.  The  best 
example  is  Administrative  Specialist,  where  only  24%  of  the 
subjects  had  complete  data. 

From  tha  results  presented  thus  far,  it  might  be  tempting  to 
conclude  that,  except  for  the  JTR/CTR  data,  aissing  data  was  not 
a  significant  problem  in  analyses  of  the  Project  A  CV  data. 

Table  8  indicates  that  this  is  not  the  case.  The  table  shows  the 
number  of  Batch  A  soldiers  with  different  patterns  of  complete 
and  missing  data  across  the  four  performance  aeastoreaent  methods. 
Fewer  than  15%  of  the  cases  in  the  entire  saaple  have  complete 
data  for  all  four  methods.  If  the  ratings  data  are  set  aside, 
there  are  still  fewer  than  25%  of  the  sxibjects  with  complete  HO, 

JX,  and  SK  data.  Similarly,  ignoring  the  HO  data  still  leaves 
only  eibout  42%  of  the  CV  subjects  with  complete  data  on  the 
remaining  measures.  Whether  or  not  the  sample  of  soldiers  with 
complete  data  la  representative  of  the  target  population,  the 
sheer  loss  of  statistical  power  associated  with  reduced  sample 
size  would  be  unacceptable.  Something  had  to  be  done. 

Treatment  of  Missing  Data 

The  processing  of  missing  data  was  approached  in  two  stages. 

In  the  first  stage,  we  focused  on  one  instrument  at  a  time  and 
dealt  with  only  those  sxibjects  who  were  missing  a  small  amount  of 
data  on  the  instrument  under  consideration.  In  the  second  stage, 
we  formulated  procedures  for  dealing  with  subjects  who  were 
missing  a  high  percentage  or  all  of  the  data  on  a  given  instrument. 


211 


TX2LZ  4 


PERCENT  OP  CASES  ITITH  MISSING  DATA  BY 
RATING  INSTRUMENT  USING  COMBINED 
SUPERVISOR  AND  PEER  RATINGS 
(ALL  MOS:  NS9430) 


Instrument 

No 

Missing 

1-10% 

Missing 

>10% 

Missing 

All 

Missing 

Army-Wide  BARS 

98.3 

0.2 

0.0 

1.5 

MOS  Specific  BARS 

97.0 

0.3 

0.9 

1.8 

Task  Ratings 

66.2 

11.2 

20.1 

2.4 

Combat  Prediction 

98.3 

0.1 

0.1 

1.5 

All  Instruments 

66.0 

28.7 

3.8 

1.5 

TABLE  5 

PERCENT  OP  CASES  WITH  MISSING  DAT.^  BY 
RATING  INSTRUMENT  FOR  SUPERVISOR 
RATINGS  ONLY 
(ALL  MOS:  NS9430) 


Instrument 

No 

Missing 

1-10% 

Missing 

>  10% 
Missing 

All 

Missing 

Army-Wide  BARS 

90.3 

0.9 

0.3 

8.5 

MOS  Specific  BARS 

82.7 

2.3 

5.3 

9.8 

Task  Ratings 

30.2 

13.5 

45.3 

10.9 

Combat  Prediction 

89.4 

1.8 

0.2 

8.6 

Ail  Instruments 

29.2 

50.0 

12.3 

8.4 

212 


TABLE  6 

PEBCEirr  OF  CASES  WITH  MZSSIKG  DATA  BY 
RATING  INSTRUMENT  FOR  PEER 
RATINGS  ONLY 
(ALL  MOS:  NS9430) 


Instrument 

No 

Missing 

1-10% 

Missing 

>10% 

Missing 

All 

Missing 

Army-Wide  BARS 

91.0 

0.4 

0.2 

8.4 

MOS  Specific  BARS 

88.9 

0.5 

1.3 

9.3 

Task  Ratings 

48.1 

11.0 

30.0 

11.0 

Combat  Prediction 

90.4 

0.9 

0.2 

8.6 

Ail  Instruments 

47.5 

34.2 

9.9 

8.4 

213 


TABLE  7 


PERCENTAGE  0?  CASES  WITH  MISSING 
TASK  RATINGS  USING  COMBINED  PEER 


AND 

SUPERVISOR 

RATINGS 

No 

1-10% 

>  10% 

All 

MOS 

Missing 

Missing 

Missing 

Missing 

11B 

71.51 

14.39 

12.39 

1.71 

13B 

75.41 

6.45 

17.54 

0.60 

1SE 

68.79 

14.51 

16.30 

0.40 

31 C 

56.28 

16.39 

24.59 

2.73 

63B 

63.27 

10.99 

22.92 

2.83 

64C 

60.50 

9.91 

26.97 

2.62 

71 L 

23.93 

18.29 

53.89 

3.89 

91A 

60.68 

13.17 

25.35 

0.80 

953 

70.38 

11.85 

17.63 

0.14 

123 

93.32 

3.13 

2.41 

1.14 

16S 

91.49 

5.32 

3.19 

0.00 

27E 

74.15 

6.80 

18.37 

0.68 

51 B 

84.26 

5.56 

8.33 

1.85 

54  E 

73.73 

12.21 

10.37 

3.69 

55B 

69.42 

12.71 

16.15 

1.72 

67N 

62.32 

13.77 

22.83 

1.09 

76W 

61.22 

14.49 

20.20 

4.08 

76Y 

49.05 

11.59 

31.43 

7.94 

94B 

59.15 

10.46 

25.16 

5.23 

ALL  MOS 

66.18 

11.20 

20.22 

2.40 

Total 

N 

702 

667 

503 

366 

637 

686 

514 

501 

692 

704 

470 

147 

108 

434 

291 

276 

490 

630 

612 

9430 


214 


TABLZ  8 


mJKBZH  0?  CXSZ8  WITH  COMHIZTH  DXTX  TOR 
CXCH  COMBZHXTZOM  OT  CRZTERZOH  ZHSTROMENTS 

BXTCH  X 


Frequency 

Complete 

Comp  K3 

Miss  K3 

Missing 

Percent 

K3  &  K5 

Miss  K5 

Comp  K5 

K3&K5 

Complete  HO  &  RA 

772 

189 

122 

58 

14.65 

3.59 

2.32 

1.10 

Comp  HO  Miss  RA 

526 

130 

72 

29 

9.98 

2.47 

1.37 

0.55 

Miss  HO  Comp  RA 

1436 

364 

215 

125 

27.26 

6.91 

4.08 

2.37 

Missing  HO  &  RA 

784 

241 

125 

80 

14.88 

4.57 

2.37 

1.52 

TOTAL 


1141 

21.66 


757 

14.37 


2140 

40.62 


1230 

23.35 


TOTAL 


3518  924  534 

66.78  17.54  10.14 


292  5268 

5.54  100.00 


215 


8ta9«  Missing  Dnta  within  Bach  Znstruasnt 

siHQunt  of  Missing  data  psrnlhtsd..  Ths  first  stsp  in  Stage  I 
was  to  decide  how  such  missing  was  too  much.  He  examined 
distributions  of  the  amotint  of  missing  data  and  foiind  somewhat  of 
a  bimodal  distribution.  Most  soldiers  had  only  a  small  nximber  of 
missing  steps,  items,  or  scales,  but  a  smaller  nvimber  had  all  or 
nearly  all  elements  missing.  For  each  instrument,  we  picked  a 
percentage  to  be  the  dividing  line  between  minimal  and  significant 
amounts  of  missing  data.  For  cases  with  minimal  missing  data,  we 
would  take  steps  to  fill  in  missing  values  so  as  to  be  able  to 
compute  performance  scores.  For  cases  with  significant  amounts 
of  missing  data,  we  would  not  attempt  to  compute  performance 
scores  for  the  instrument  in  question. 

In  general,  we  sought  to  retain  90  -  95%  of  the  soldiers 
tested  in  each  MOS,  but  to  eliminate  cases  with  more  than  10% 
missing  elements.  For  the  written  tests  (JK  and  SK) ,  we  were 
able  to  set  a  10%  missing  cutoff  and  still  retain  well  over  95% 
of  the  sxibjects  in  each  MOS.  For  HO  and  each  of  the  rating 
instriiments  a  slightly  more  liberal  cutoff  of  15%  missing  was 
chosen  as  the  best  balance  between  the  desire  to  retain  most  of 
the  cases  and  the  desire  to  limit  strongly  the  number  of  values 
that  we  would  have  to  impute  to  achieve  complete  data.  For  the 
HO  data  a  two-stage  rule  was  adopted.  For  each  task  tested,  we 
decided  to  generate  a  task  score  only  if  no  more  than  15%  of  the 
steps  were  missing.  He  then  computed  overall  hands-on  scores 
only  if  no  more  than  3  task  scores  (no  more  than  4  task  scores 
for  31C  and  63B,  where  we  had  relatively  small  samples}  were 
missing. 

Dropping  unreliable  responders.  In  addition  to  dropping 
cases  with  too  much  missing  data  on  an  instrument,  we  also 
developed  rules  for  identifying  and  eliminating  "xinreliable”  or 
random  responders  on  each  instrument.  Again  the  rules  were 
developed  and  adopted  on  an  instrument-by- instrument  basis.  For 
the  written  tests,  a  random  response  index  was  defined  as  the 
correlation  between  the  item  score  (1  for  correct  and  0  for 
incorrect)  and  item  difficulty  (expressed  as  proportion  of  subjects 
who  answered  the  item  correctly) .  For  most  examinees  this 
correlation  was  positive  since  there  was  a  tendency  to  get  the 
easier  items  correct  and  miss  the  more  difficult  items.  In  a  few 
instances  this  correlation  was  essentially  zero,  suggesting 
random  responding.  For  these  subjects,  all  of  their  responses 
for  that  particular  instrument  were  set  to  missing 

Random  responding  was  not  a  concern  for  the  hands-on  data. 

The  data  sheets  were  filled  out  by  trained  (and  monitored)  NCOs 
and  not  by  the  examinees  themselves.  There  was  no  indication 
that  any  subjects  intentionally  responded  poorly  or  randomly  in 


front  of  the  NCO  scorers.  No  screening  for  unreliable  responses 
in  the  hands-on  data  was  conducted. 

For  the  rating  data,  we  screened  for  unreliable  raters 
rather  than  unreliable  exaoinees.  We  constructed  relieUsility 
indices  for  each  rater  by  comparing  their  ratings  with  the  average 
of  all  other  raters'  ratings  of  the  same  soldiers  on  the  same 
scales.  Both  mean  difference  and  correlational  indices  were  used 
in  identifying  "outliers"  among  the  raters. 

Establishing  separate  tracks  to  account  for  eeruinmert 
differences.  For  several  MOS,  the  hands-on  scoring  differed  for 
different  equipment.  In  order  to  achieve  comparable  scores 
across  these  equipment  differences,  we  separated  the  exeiminees 
into  separate  "tracks"  corresponding  to  the  different  variations 
in  equipment.  (For  Military  Police,  for  example,  females  use  and 
were  tested  on  a  .38  caliber  hand  gun  while  males  use  and  were 
tested  on  a  .45  caliber  hand  gun.)  We  found  at  most  minimal 
differences  between  track  samples  on  those  tasks  that  were  scored 
the  same,  so  we  achieved  comparadsle  scoring  by  stemdardizing 
scores  computed  from  tracked  tasks  separately  for  each  track 
sample.  Scores  for  each  track  were  standardized  to  have  a  mean 
and  standard  deviation  that  matched  the  original  overall  mean  for 
the  score  in  question. 

Number  of  subieci:3_dropa_ed  for  missing  data,  or  unreliable 
responses.  The  nxrmber  of  cases  deleted  due  to  too  much  missing 
data  and/or  to  apparent  random  responding  for  the  SK  tests  are 
shown  in  Tedsle  9.  Similar  results  for  the  JK  tests  are  shown  in 
Table  10  and  the  numbers  of  cases  deleted  due  to  too  much  missing 
data  on  the  HO  tests  are  shown  in  Tedsle  11.  Elimination  of 
unreliedsle  raters  did  not  result  in  the  loss  of  rating  data  for 
any  individual  sxibjects.  In  all  cases,  where  raters  were 
eliminated,  there  were  other  raters  providing  data  on  these 
subjects.  (Where  there  were  no  other  raters,  the  rater  in  question 
was  not  eliminated  because  there  was  no  basis  for  estimating  the 
reliability  of  the  ratings.) 

Imputing  missing  values.  After  dropping  cases  with  too  much 
missing  data  or  with  random  responses,  we  imputed  values  for  the 
remaining  missing  data  so  that  summary  performance  scores  could 
be  computed. 

We  considered  several  options  for  imputing  scores.  The 
first  was  to  compute  the  subject's  mean  on  the  variables  that 
were  present  and  then  substitute  this  mean  for  each  of  the  missing 
variables.  If  a  s\ibject  passed  80%  of  the  items  or  steps  for  a 
particular  task  or  test,  we  could  substitute  a  value  of  .8  for 
any  missing  item  or  step  scores.  This  is  equivalent  to  defining 
the  total  score  as  the  mean  of  the  values  present,  which  was  done  - 
in  the  field  test.  The  problem  with  this  approach  was  that  items 
and  steps  differed  considerably  in  difficulty.  There  were  cases 


217 


TABLE  » 


irUMBER  OF  CASES  VZTE  SX  DATA  DELETED  DOE  TO 
TOO  MUCH  MZSSIKG  OR  RANDOM  RESPONSE 


MCS 

MISSING 
>  10% 

RANDOM 

RES? 

BOTH 

TOTAL 

DROPPED 

TOTAL 

N 

PERCENT 

DROPPED 

113 

2 

8 

0 

10 

694 

1.4 

13B-S 

3 

10 

0 

13 

536 

2.4 

133-T 

2 

2 

0 

4 

117 

3.4 

19S 

4 

€ 

0 

10 

495 

2.0 

3 1C 

1 

5 

0 

6 

355 

1.7 

633 

10 

5 

0 

15 

627 

2.4 

64C 

3 

7 

0 

10 

679 

1.5 

3 

5 

0 

8 

501 

1.6 

SIA 

2 

5 

0 

7 

486 

1.4 

4 

9 

0 

13 

687 

1.9 

123 

5 

11 

0 

16 

698 

2.3 

16S 

0 

1 

0 

1 

469 

0.2 

272 

2 

3 

0 

5 

147 

3.4 

313 

4 

0 

1 

5 

107 

4.7 

542 

2 

4 

0 

6 

432 

1.4 

353 

IS 

1 

0 

16 

289 

5.5 

67N 

5 

0 

0 

5 

276 

1.8 

7  6W 

9 

5 

1 

15 

488 

3.1 

19 

10 

0 

29 

625 

4.6 

943 

14 

20 

0 

34 

604 

5.6 

21C 


TXBLS  10 


NX7MBER  or  CASES  WITH  JE  DATA  DELETED  DUE  TO 
TOO  MUCH  KXSSZMO  OR  RANDOM  RESPONSE 


MCS 

KISSING 
>  10% 

RANDOM 

RESP 

BOTH 

TOTAL 

DROPPED 

TOTAL 

N 

PERCENT 

DROPPED 

113 

9 

6 

0 

15 

693 

2.2 

123 

16 

1 

1 

18 

657 

2.7 

193 

29 

6 

1 

36 

495 

7.3 

3 1C 

31 

2 

0 

33 

359 

9.2 

S33 

26 

4 

1 

31 

627 

4.9 

64C 

7 

4 

0 

11 

679 

1.6 

711 

6 

1 

0 

7 

508 

1.4 

91A 

9 

4 

0 

13 

496 

2.6 

953 

22 

3 

0 

25 

690 

3.6 

TABLE  11 

NUMBER  or  CASES  WITH  EANDS-ON  DATA  DELETED 
DUE  TO  TOO  MUCH  MZS8ZN0 


219 


wher*  tha  onittad  items/ stepa  wara  considarably  mora  (or  lass) 
difficult  than  tha  itams/staps  that  wara  complatad,  so  systematic 
bias  would  ba  introducad  by  substituting  tha  axaminaa's  maan. 

Tha  sacond  option  that  wa  considarad  was  to  substituta  the 
variable  (item,  step,  scale)  maan  for  all  missing  values  on  that 
variable.  This  option  was  rejected  because  it  would  reduce 
individual  differences.  Subjects  performing  at  different  levels 
should  have  different  estimates  for  tha  missing  items. 

The  option  used  to  fill  in  missing  values  was  a  procedure 
that  had  been  developed  for  the  National  Center  for  Education 
Statistics  (now  the  Center  for  Education  Statistics)  known  as 
PHOC  IMPUTE  (Wise  &  McLaughlin,  1980) .  Several  features  of  PROC 
IMPUTE  made  it  preferable  to  other  readily  available  options  for 
filling  in  tha  missing  CV  values. 

First,  PROC  IMPUTE  uses  regression  aquations  to  predict 
missing  values.  Each  missing  value  is  predicted  from  other 
values  for  tha  sxibjact  in  question  so  that  individual  differences 
are  retained.  Tha  regression  coefficient  and  intercept  vary  from 
item  to  item  so  that  differences  in  item  difficulty  are  also 
reflected  in  tha  predicted  values. 

Second,  PROC  IMPUTE  adds  a  random  varieibla  with  variance 
equal  to  the  error  of  estimate  for  predicting  tha  missing  value. 

If  such  a  random  variable  is  not  added,  the  imputed  values  are 
more  highly  correlated  with  values  on  other  variables  in  comparison 
with  nonimputed  values. 

Third,  PROC  IMPUTE  employs  a  sequential  strategy  that 
maintains  relationships  between  variables  when  more  than  one 
value  is  imputed  for  the  same  examinee.  A  two-stage  approach  is 
employed  so  that  the  first  variable  is  imputed  from  nonmissing 
values.  The  second  (and  s\ibsequent)  variable (s)  are  imputed  from 
the  nonmissing  values  plus  the  imputed  value  for  the  first 
variable.  After  all  initial  imputations,  values  are  reimputed  in 
a  second  pass  where  all  of  the  initially  imputed  values  participate 
in  the  reimputation  of  each  missing  value. 

Finally,  PROC  IMPUTE  models  nonlinear  relationships  between 
the  predicted  and  actual  values.  If  the  actual  values  are 
discrete,  PROC  IMPUTE  provides  discrete  values  for  the  missing 
elements  as  well.  Table  12  Illustrates  the  final  step  in  PROC 
IMPUTE.  The  predicted  values  were  divided  into  six  equal  intervals 
to  define  predicted  "levels".  Thera  were  from  61  to  92  cases  at 
each  predicted  level  for  whom  actual  technical  skill  ratings  were 
available.  The  distribution  (in  percentages)  of  actual  scores 
for  each  predicted  level  is  shown.  For  each  soldier  with  a 
missing  technical  skill  rating,  a  predicted  level  is  computed. 
(Actually,  the  program  interpolates  between  predicted  levels.)  A 
uniformly  distributed  random  number  between  0  and  100  is  generated 


220 


and  nappad  onto  tha  actual  lavals  using  tha  cuamulativa 
distribution  of  actual  scoras  for  tha  pradictad  laval.  (Again 
tha  prograa  actually  intarpolatas  batvaan  lavals.)  Tha  actual 
laval  scoras  ara  than  transformad  back  to  tha  original  units. 


Tabla  12 

Distribution  of  Technical  Skill  Ratings 
for  Each  Predicted  Level 

Percent  at  Each  Actual  level 


Predicted 

Total  # 

Level 

of  cases 

1 

2 

1 

1 

5. 

1 

67 

IS 

57 

18 

0 

0 

2 

61 

0 

21 

77 

2 

0 

3 

92 

0 

7 

65 

28 

0 

4 

89 

0 

0 

40 

59 

1 

5 

92 

0 

0 

8 

91 

1 

6 

71 

0 

0 

3 

52 

45 

PROC  IMPUTE  was  used  in  all  instances  except  one.  For  the 
written  tests,  a  distinction  was  zsada  batvaan  internal  omits 
(prior  to  tha  last  item  answered)  and  items  that  vara  not  reached 
(omits  after  tha  last  item  answered) .  For  internal  omits,  we 
assximad  that  tha  axaminaa  did  not  know  the  auisvar  and  substituted 
a  score  equal  to  the  guessing  rata  (a.g.,  .2  for  a  5  option 
item) .  If  tha  actual  proportion  passing  the  item  was  lower  than 
the  guessing  rata,  tha  proportion  passing  was  used  instead.  We 
made  no  assumptions  regarding  items  not  reached  since  the  examinee 
may  not  have  had  time  to  demonstrate  Icnovledge  of  the  item.  Not 
reached  items  vara  imputed  with  PROC  IMPUTE  as  vara  all  missing 
hands-on  steps  and  rating  scales. 

Tables  13,  14,  and  15  show  the  changes  in  summary  for 
statistics  that  resulted  from  tha  Stage  Z  screening  (pruning) , 
standardizing  (by  track),  and  imputating  for  threa  different  MOS. 
Initial  totals  ware  computed  using  means  of  available  data.  The 
sample  sizes  dropped  slightly  due  to  screening  out  random 
responders  and  cases  with  too  much  missing  data.  Only  small 
changes  in  means,  standard  deviations,  reliabilities  and 
correlations  resulted  from  the  stage  I  procedures.  (Mean  shifts 
for  the  first  three  scales  should  be  compared  against  a  standard 
deviation  of  10.0,  while  the  three  rating  factors  were  on  a  7 
point  scale  with  a  standard  deviation  of  just  \inder  1.0.) 

stage  ZI:  Missing  Instruments 

After  cases  were  dropped  or  missing  values  were  filled  in  on 
an  instrument-by- instrument  basis,  we  were  ready  to  compute 


221 


Table  13 


STAGE  X  RESULTS 

CBAKGZS  IN  STATISTICAL  CHARACTERISTICS  0?  Sr>ClARY  PERFORMANCE  >a:ASrRES 
RESULTING  PROM  PRUNING,  IMPUTING  (EXCEPT  RATINGS),  AND  STANDARDIZING 

MOS  IIB:  INFANTRY 


VTION  WITH; 


OT-pr.  MTASU?.2  MS  AN  _SB-  SSL  12  12  £0  ^  A2  ^ 
SX  TOTAL  SCORE  -11  0.6  -0.7  -01  .  00  05  00  C2  -01 
JK  TOTAL  SCORE  -15  0.1  -0.3  01  00  .  05  01  Cl  CC 
HANDS-ON  TOTAL  -4  -0.4  -0.9  02  05  05  .  05  C2  05 
An31:  TECH/SFSORT  -1  -.01  .01  02  00  00  05  .  CO  00 
AN32:  INTEGiCONTR  -1  -.01  .01  02  02  01  02  00  .  01 
AV.'33:  APPEARANCE  -1  .00  .01  02  -01  00  06  00  Cl 


Table  14 
STAGE  I 

CHANGES  IN  STATISTICAL  CHARACTERISTICS  OF  SUMMARY  PERFOSMAIICS  .MEASURES 
RESULTING  FROM,  IMPUTING  (EXCEPT  RATINGS),  AND  STANDARDIZE: 3 

.MOS  63B:  TRUCE  MECHANIC 


rnr»r*r* 


?-?.T  ?rEAS*JRE 

N 

ME.AN 

5D 

REL 

£2 

£2 

£0 

hi 

hi 

si:  total  score 

-15 

0.8 

-0.7 

-01 

• 

C2 

05 

02 

Cl 

-C3 

Ji:  TOTAL  SCORE 

-31 

0.3 

-0.4 

00 

02 

• 

08 

00 

Cl 

•Cl 

HANDS-ON  TOTAL 

-40 

-0.3 

-1.2 

-03 

05 

08 

• 

-02 

-Cl 

-cs 

1 :  TI CH/EFFORT 

-02 

.01 

.00 

02 

-02 

CO 

-02 

• 

CO 

Cl 

AWEZ:  INTEGiCONTR 

-02 

.01 

.00 

01 

00 

01 

-01 

CO 

*  • 

oc 

ap?eara-nce 

-02 

.01 

.00 

02 

-03 

-01 

-06 

01 

CO 

• 

222 


15 


8TAG2  I 

CHAKGZa  121  STATISTICAL  CHARACTERISTICS  0?  Si:2CtARY  PERTCR-HANCE  MEASC?. 


RZSCLTIIIG  FROM 

PRCMZMG, 

IMPUT 

IMG  (EXCEPT 

RATINGS) , 

A2{D 

STAN3ARZII 

I2/G 

MOS 

71L: 

CLERK 

VfTi>» 

c  ^ 

SY, 

ry 

•  •  ^ 

^  • 

^  ■» 

f  ••  • 

•V  ^  _  ."T 

^  — 

lis  «  •  »  w  * 

-03 

0.4 

-0.3 

• 

^  • 
w  A 

03 

•• 

-;i 

JK  TOTAL  3CCFZ 

-07 

0.1 

1 

o 

• 

00 

01 

• 

02 

01 

01 

HAN2S*CN  rCTAL 

+06 

-7.0 

1 

o 

• 

CU 

03 

03 

02 

• 

0  0 

c : 

A  • 

AW21:  TIC:-:/i""ORT 

00 

1 

• 

o 

o 

O 

O 

• 

00 

02 

01 

-01 

• 

c: 

00 

AK32:  INTZGiCCItTR 

00 

o 

o 

• 

o 

o 

• 

00 

02 

01 

00 

00 

• 

O 

C  i 

AW33;  APrZARAiiCS 

00 

o 

o 

• 

o 

o 

• 

00 

-01 

00 

-01 

00 

O 

o 

« 

223 


O  II 


Tatol«  16 


NDMBSk  07  CASES 
KXSSZSG  EACH  ZNSTRUKEHT 
(After  Stage  Z  Screening  and  Zaputation) 


llfi 


Total  N  702 
Missing  Hands-On  20 
Missing  Job  Kn  24 
Missing  Scho  Kn  18 
Missing  AW  Bars  7 
Missing  MOS  Bars  9 
Missing  Comb  Fred  7 
Missing  Al:  Awards  14 
Missing  Al:  Phys  Red  63 
Missing  A4:  Arts.  15  23 
Missing  A5;  Pron  Rt  109 

Total  Ccnplete  512 
%  Complete  72.9 


Final  Counts  After  Stage. 

Total  N  693 

%  of  Original  98.7 


na 

19E 

63B 

667 

503 

366 

637 

55 

29 

25 

68 

29 

44 

40 

41 

28 

18 

17 

25 

2 

1 

8 

12 

12 

3 

9 

18 

2 

1 

8 

12 

24 

13 

13 

11 

93 

53 

30 

80 

28 

16 

14 

11 

143 

83 

62 

97 

406 

335 

241 

411 

60.9 

66.6 

65.9 

64.5 

II 


656 

490 

356 

615 

98.4 

97.4 

97.3 

96.6 

IIL 

21h 

m 

686 

514 

501 

692 

46 

20 

27 

18 

13 

13 

25 

17 

21 

22 

13 

8 

11 

3 

0 

13 

23 

8 

0 

8 

11 

3 

0 

12 

14 

11 

4 

81 

60 

59 

57 

14 

15 

14 

4 

86 

79 

61 

84 

486 

355 

374 

513 

70.9 

69.1 

74.7 

74.1 

675 

506 

492 

686 

98.4 

98.4 

98.2 

99.1 

224 


Tabl«  17 


8TAGB  ZZ 

CH3UVGZS  Zlf  8TATZSTZCAL  CEARXCTSRZSTZCS  07  8UKMARy  PERFORMAMCE  MEASURES 

RESULTZMG  FROM  STAGE  ZZ  ZMPUTATZOMS 
MOS  IIB:  ZMPAMTRY 


CORREIATION  WITH: 


PER?  MEAS'JP.E 

N 

MEAN 

_SD 

SK 

IS 

go 

hi 

A2 

A3 

SX  TOTAL  SCORE 

+11 

-.1 

+  .  1 

• 

01 

-02 

02 

01 

01 

JK  TOTAL  SCORE 

+  16 

-.5 

+  .1 

01 

• 

00 

01 

00 

01 

HANDS-ON  TOTAL 

+15 

-2.0 

-.5 

-02 

00 

• 

02 

01 

03 

AWBl:  TECH/EFFORT 

0 

.0 

.01 

02 

01 

02 

• 

00 

00 

AWB2:  ZNTEG&CONTR 

+6 

-.01 

+  .01 

01 

00 

01 

00 

• 

00 

AWB3;  APPEARANCE 

+6 

o 

o 

• 

o 

o 

• 

01 

01 

03 

00 

00 

• 

TabXtt  18 
STAGE  ZZ 

CRAMGES  ZM  STATISTICAL  CBARACTERZSTZCS  OF  SUMMARY  PERFORMANCE  MEASURES 

RESULTING  FROM  STAGE  ZZ  ZMPUTATZONS 
MOS  63B:  TRUCK  MECEANZC 


PER?  MEASURE 

JL. 

MEAN 

$g.- 

£S 

IS 

CORRELATION  WITH: 

HQ,  hi  hi  hi 

SK  TOTAL  SCORE 

+13 

-.4 

-.0 

• 

00 

01 

01 

00 

00 

JK  TOTAL  SCORE 

+25 

1.0 

.0 

00 

• 

02 

01 

00 

02 

HANDS-ON  TOTAL 

+49 

-.9 

-.7 

01 

02 

• 

00 

00 

05 

AWBl:  TECH/EFFORT 

0 

.00 

.00 

00 

00 

00 

• 

00 

00 

AWB2;  ZNTEG&CONTR 

+6 

.00 

.00 

00 

00 

00 

00 

• 

00 

AWB3:  APPEARANCE 

+9 

o 

o 

• 

.01 

00 

02 

05 

00 

00 

• 

225 


Tabl«  19 


8TXGE  XZ 

CHANGES  IN  STATISTICAL  CHARACTERISTICS  OF  SUMMARY  PERFORMANCE  MEASURES 

RESULTING  FROM  STAGS  II  IMPUTATIONS 
MOS  71L:  CLERK 


CORRSLATION  WITH: 


MEASURE 

N 

MEAN 

SD- 

S2 

JK 

hi 

hi 

hi 

SK  TOTAL  SCORE 

+  11 

.0 

.0 

• 

00 

-03 

00 

00 

00 

JK  TOTAL  SCORE 

+6 

.8 

.0 

01 

• 

-01 

00 

-02 

00 

HANDS-ON  TOTAL 

+  18 

-4.3 

-1.7 

-03 

-01 

• 

-01 

00 

02 

AWBl:  TZCH/EFFORT 

0 

.00 

.00 

00 

00 

-01 

• 

00 

00 

AWB2:  INTEG&CONTR 

+7 

o 

o 

• 

.00 

00 

-02 

00 

00 

• 

01 

AW33:  APPEARANCE 

9 

o 

o 

• 

.00 

00 

00 

-02 

00 

01 

• 

226 


overall  performance  scores  that  combined  Information  from  the 
different  measurement  methods.  The  decision  at  this  stage  was 
whether  to  estimate  individual  scores  if  only  partial  data  were 
available  for  the  individual.  We  decided  on  a  50%  rule.  An 
examinee  had  to  have  data  on  at  least  half  of  the  instruments 
going  into  a  particular  performance  construct  before  ve  would 
estimate  a  score  on  the  performance  construct.  Where  50%  or 
fewer  of  the  pieces  were  missing,  PROC  ZMFUTS  was  again  used  to 
fill  in  the  missing  pieces. 

Table  16  shows  the  niimber  of  soldiers  in  each  MOS  who  had 
missing  values  for  each  instrument  after  the  completion  of  the 
Stage  I  imputations  and  screening.  In  most  instances,  the  nimber 
of  missing  cases  was  quite  small  (1  or  2%) .  The  chief  exceptions 
were  two  of  the  administrative  measures.  (Administrative  measures 
were  not  included  in  stage  I  imputations  because  they  do  not 
include  a  large  number  of  component  parts.)  Physical  Readiness 
tesm  scores  were  missing  for  10  to  15%  of  the  examinees.  In  most 
instances,  peer  and  supervisor  ratings  of  physical  fitness  were 
available  for  these  same  examinees.  Similarly,  Promotion  Rate 
Deviation  scores  were  missing  for  a  sicnificzmt  number  of  cases 
(15%) .  This  was  primarily  due  to  problems  in  retrieving  Accession 
file  information  needed  to  compute  time-in*service.  For  the  most 
part,  variation  in  promotion  rates  among  first  tour  enlisted 
soldiers  reflected 

instances  where  disciplinary  problems  led  to  delays  in  promotions. 
Such  delays  were  predicted  fairly  well  irom  ratings  of  self 
control  and  integrity  and  from  the  (.dministrative  index  of 
disciplinary  actions. 

Tables  17,  18,  and  19  show  changes  in  summary  statistics 
that  resluted  from  Stage  II  imputations  for  the  same  three  MOS  as 
before.  Again  only  small  changes  resulted.  There  was  a  slight 
drop  in  hands-on  means,  because  soldiers  with  missing  hands-on 
scores  tended  to  score  well  below  average  on  other  measures. 

SUBOBary 

The  decision  rules  and  imputation  procedures  used  with  the 
CV  data  were  successful  in  allowing  us  to  develop  performance 
scores  for  a  very  high  proportion  of  the  soldiers  tested.  Based 
on  the  available  evidence,  we  have  no  reason  to  believe  that  any 
significant  distortions  were  introduced  while  achieving  this 
goal.  Relatively  few  values  were  imputed  at  all.  Where  imputation 
was  necessary,  it  was  done  with  great  care. 

The  apparent  ease  of  imputation  procedures  should  not, 
however,  lead  us  to  relax  our  data  collection  procedures  in  the 
future.  Lessons  learned  from  investigation  of  the  reasons  for 
missing  data  will  be  used  to  modify  data  collection  procedures 
for  the  Project  A  longitudinal  validation  so  as  to  further  reduce 
the  amount  of  missing  data. 


227 


REFERENCE: 

Wise,  L.  L.  &  McLaughlin,  D.  H.  (1980) .  Guidebook  for  the 
iaputetioB  of  uisaiag  date.  Palo  Alto,  CA. :  Aaerican 
Institutes  for  Research. 


228 


DEVELOPMENT  OF  PROJECT  A  JOB  PERFORMANCE  MEASURES 


Charlotte  Campbell 

Human  Resources  Research  Organization 
Halter  C.  Borman 

Personnel  Decisions  Research  Institute 
Daniel  C.  Felker 

American  Institutes  for  Research 
Pat  Ford 

Maria  De  Vera  Park 

Human  Resources  Research  Organization 
Elaine  C.  Pulakos 

Personnel  Decisions  Research  Institute 

Barry  J.  Riegelhaupt 
Human  Resources  Research  Organization 

Michael  G.  Rumsey 
U.S.  Army  Research  Institute 


Presented  at  the  Annual  Conference  of  the 
Society  for  Industrial  and  Organizational  Psychology 

Atlanta,  Georgia 

April  1987 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


229 


DEVELOPMENT  OF  PROJECT  A  JOB  PERFORMANCE  MEASURES 


Charlotte  H.  Campbell,  Walter  C.  Borman,  Daniel  C.  Felker, 

Pat  Ford,  Maria  de  Vera  Park,  Elaine  C.  Pulakos, 

Barry  J.  Riegelhaupt,  and  Michael  6.  Rumsey 

You  have  heard  from  the  previous  presenters  about  the  overall  objec¬ 
tives  of  Project  A,  and  about  the  predictor  development.  The  purpose  of 
this  paper  is  to  describe  the  objectives,  procedures,  and  products  of  the 
criterion  development  work. 

The  overall  strategy  for  performance  (i.e.,  criterion)  measurement  in 
Project  A  was  to  define  the  total  domain  of  Army  entry-level  enlisted 
personnel  performance  in  and  then  develop  reliable  and  valid  measures  of  all 
of  the  major  components.  The  specific  measures  would  be  used  as  criteria 
against  which  to  validate  the  predictor  measures. 

In  defining  the  performance  domain,  we  began  with  two  assumptions.  The 
first  is  that  job  performance  is  multidimensional.  There  is  no  one 
attribute,  outcome,  or  factor  that  can  be  labeled  as  job  performance.  The 
second  assumption  is  that  job  perfonnance  is  manifested  by  a  wide  variety  of 
behaviors  or  activities,  things  people  do,  that  are  judged  to  be  important 
for  accomplishing  the  goals  of  the  organization.  Each  of  these  activities 
probably  requires  different  knowledges  and  skills  which  are  in  turn  most 
likely  a  function  of  different  abilities. 

For  any  particular  job,  one  fundamental  task  in  defining  the  perfor¬ 
mance  domain  is  to  describe  the  basic  factors  that  comprise  performance. 

For  the  population  of  entry-level  positions  in  the  Army,  we  postulated  that 
there  are  two  major  types  of  job  performance  factors.  The  first  is  composed 
of  performance  components  that  are  specific  to  a  particular  job,  such  as 
typing  for  the  administrative  specialists  or  loading  the  tank  gun  for  tank 
crewmen;  these  we  have  labeled  "job-specific"  criterion  factors.  The  second 
type  of  performance  includes  components  that  are  defined  and  measured  in  the 
same  way  for  every  job  (Borman,  Motowidlo,  Rose,  &  Hanser,  1985).  These 
have  been  referred  to  as  "Army-wide"  criterion  factors.  Examples  might  be 
proficiency  on  the  tasks  for  which  every  soldier  is  responsible,  or 
demonstrating  peer  leadership  or  support. 

The  initial  working  model  of  totaJ^per forma nee  viewed  performance  as 
multidimensional  within  the  job-specific  and  Army-wide  factors.  The  job 
analysis  and  criterion  construction  methods  were  designed  to  "discover"  the 
content  of  these  factors  via  a  comprehensive  description  of  the  total 
performance  domain,  several  iterations  of  data  collections,  and  the  use  of 
multiple  methods  for  identifying  and  measuring  the  basic  performance 
factors. 

Defining  the  Job  Content  Domain 

The  definition  of  the  job  content  domain  was  approached  from  several 
angles,  including  collection  of  critical  incidents,  review  of  Army  job  and 
task  analyses,  and  review  of  Army  training  programs. 


230 


Critical  Incidents. 


Through  the  conduct  of  critical  Incident  Morkshops,  Army  personnel 
provided  hundreds  of  critical  Incidents  of  specific  task  performance  vfithin 
each  focal  Job,  and  thousands  of  critical  Incidents  describing  performance 
behaviors  that  have  a  general,  not  job-specific,  referent.  These  large 
samples  of  Job  behaviors  were  translated  Into  dimensions  which  Identify  both 
Job-specific  and  Army-wide  performance  factors. 

Army  Job  and  task  analyses. 


The  Army  maintains  complex  and  definitive  Job  and  task  analyses  for 
every  enlisted  Job.  These  Include  lists  of  the  tasks  required  of  all 
soldiers,  regardless  of  their  Jobs,  and  provides  step-by-step  descriptions 
of  how  the  task  Is  to  be  performed,  under  what  conditions,  and  to  what 
standard.  Another  part  of  the  system  lists  the  tasks  required  for  soldiers 
In  the  specific  Jobs  and  provides  similar  detailed  task  analyses.  The  Army 
Occupational  Survey  Programs  are  task  Inventories  for  specific  Jobs,  which 
are  administered  periodically  to  soldiers  In  the  Jobs  to  determine  which  of 
hundreds  of  tasks  and  activities  are  performed  by  soldiers  at  various  levels 
within  the  Army. 

For  each  Job,  a  data  bank  of  task  statements  was  accumulated  from  the 
Integration  of  these  sources,  and  the  Individual  task  statements  were  edited 
to  determine  If  they  Indeed  focused  on  observable  Job  tasks.  If  they  were 
redundant  or  overlapped  with  other  tasks.  If  they  were  required  only  for 
soldiers  In  restricted  or  specialized  assignments,  and  If  they  were  at  the 
same  level  of  generality.  Army  Job  experts  reviewed  these  edited  lists  to 
determine  whether  they  provided  a  complete  picture  of  the  Job  requirements. 
The  result  was  a  task-based  definition  of  the  Job  performance  domain. 

Army  training  programs. 


Prior  to  beginning  work  In  any  Job,  soldiers  attend  training  courses 
that  cover  both  basic  Army  soldiering  skills  and  Job-specific  skills.  As  a 
matter  of  Army  policy,  training  must  be  Job-related;  therefore  examination 
of  the  training  curricula  should  provide  another  view  of  the  domain  of  Job 
performance.  Working  with  each  of  the  schools  where  this  training  Is 
developed  and/or  administered  for  the  19  Jobs,  we  developed  descriptions  of 
the  objectives  and  content  of  the  training  curricula.  Job  and  task 
analyses,  described  above,  were  used  In  conjunction  with  these  descriptions 
to  develop  detailed  descriptions  of  training  for  each  job.  What  was 
■  produced  was  a  thorough  analysis  of  the  objectives,  curriculum,  and 
assessment  procedures  for  the  key  schools. 


231 


Representing  and  Measuring  the  Job  Content  Domain 

The  criterion  development  work  was  guided  by  the  desire  to  cover  as 
many  bases  as  possible  relative  to  the  population  of  criterion  measures  that 
It  Is  possible  to  collect.  We  know  a  lot  more  about  predictor  constructs 
than  we  do  about  job  performance  constructs.  There  are  volumes  of  research 
on  the  former,  and  almost  none  on  the  latter.  For  personnel  psychologists 
It  Is  almost  second  nature  to  talk  about  predictors  In  terms  of  constructs. 
However,  Investigation  of  job  performance  constructs  seems  limited  to  those 
few  studies  dealing  with  synthetic  validity  and  those  using  the  critical 
Incidents  format  to  develop  performance  factors.  Relatively  little 
attention  has  been  given  to  conceptualizing  performance  In  clerical, 
technical,  or  skilled  jobs.  Because  we  know  so  little  about  the  underlying 
structure  of  job  performance,  we  used  every  bit  of  measurement  technology  we 
had. 


Our  use  of  the  technology  was,  we  hoped,  even-handed  with  respect  to 
methods  of  defining  performance  and  developing  measures.  We  would  be  hard- 
pressed  to  defend  placing  the  criterion  variables  on  some  continuum  from 
Immediate,  through  Intermediate,  to  ultimate  as  a  means  for  portraying  their 
relative  Importance  or  functional  Interrelationships.  For  example,  although 
there  are  good  reasons  for  developing  hands-on  (job  sample)  performance 
measures,  we  would  not  be  willing  to  defend  hands-on  performance  scores  as 
the  "most  ultimate"  measure.  And  although  job  analyses  based  on  critical 
Incidents  enjoy  great  respect  and  Intuitive  appeal,  we  would  not  propose 
these  analyses  as  the  "most  valid"  definitions  of  performance  requirements. 

Although  our  efforts  Involved  Intensive  examination  of  the  job-specific 
domain  for  all  of  the  19  jobs.  Intensive  measurement  was  to  be  focused  on 
nine  of  those  jobs;  fewer  job-specific  criterion  measures  were  to  be 
developed  for  the  other  ten  jobs. 

Analysis  of  the  critical  Incidents  led  to  the  development  of  two  sets 
of  behavlorally  anchored  rating  scales  (BARS).  One  set,  which  was  based  on 
those  behavioral  examples  which  were  tied  to  performance  of  job-specific 
activities,  consists  of  six  to  twelve  scales  for  each  of  the  nine 
Intensively-studied  jobs.  The  other  set  was  developed  from  the  non- job- 
specific  examples,  and  consists  of  11  Army-wide  scales,  which  would  be  used 
to  assess  performance  of  soldiers  In  all  19  jobs-.  On  these  and  on  all  other 
rating  scales,  soldiers  would  be  rated  by  peers  and  supervisors.  Figure  1 
presents  an  example  of  the  job-specific  dimensions  for  one  of  the  nine  jobs, 
and  lists  the  Army-wide  dimensions  which  applied  to-  all  jobs.  It  should  be 
noted  that  what  we  developed  were  behavioral  summary  scales,  containing 
anchors  that  represent  the  behavioral  content  of  all  performance  Incidents 
reliably  retranslated  by  Army  job  experts  for  that  particular  level  of 
effectiveness. 


232 


The  task-based  domain  lists  were  clustered  sampled  by  Army  personnel 
with  experience  In  each  of  the  nine  Jobs,  who  also  provided  Judgments  of  the 
Importance  of  each  task  and  the  expected  performance  level  and  variability 
of  each  task  for  entry-level  soldiers.  Other  Job  experts  then  sampled  30 
tasks  from  each  domain  using  the  clusters  and  Judgments.  For  each  of  the  30 
tasks  that  they  selected  for  each  Job,  we  developed  multiple-choice  paper- 
and- pencil  Job  knowledge  tests;  for  15  of  those  tasks,  we  also  constructed 
hands-on  Job  sample  tests.  In  several  of  the  Jobs,  we  developed  parallel 
versions  of  the  Job  knowledge  and  hands-on  tests  In  order  to  cover  various 
equipment  systems  in  operation.  Figure  3  lists  the  tasks  for  which  tests 
were  developed  for  one  of  the  nine  Jobs. 

The  analysis  of  the  Army's  Job  training  programs  Included  grouping  of 
the  training  objectives  Into  duty  areas,  corresponding  to  the  grouping  of 
tasks  In  the  occupational  surveys.  Multiple-choice  paper-and-pencll  tests 
of  training  achievement  were  constructed  for  the  tasks  In  the  duty  areas, 
for  each  of  the  19  Jobs.  In  order  to  cover  the  "Incidental  learning"  of  Job 
tasks  not  covered  specifically  during  training,  each  test  also  Included 
Items  for  tasks  not  Included  In  the  curricula.  Army  Job  experts  rated  each 
Item  on  Its  Importance  and  relevance  to  training  and  to  Job  performance; 
these  ratings  were  used  In  selecting  Items  to  appear  on  the  tests.  Figure  4 
presents  the  duty  areas  for  which  Items  were  developed  for  one  of  the  19 
Jobs, 


As  we  went  through  these  development  activities,  we  became  aware  of 
potential  shortcomings  In  the  set  of  performance  measures,  which  could  prove 
to  be  Important  In  Interpreting  results.  Accordingly,  additional  measures 
were  developed.  Three  were  to  be  administered  for  soldiers  In  all  19  Jobs 
(Figure  5).  These  Included  a  single  rating  scale  of  Overall  Effectiveness, 
on  which  the  rater  was  to  consider  performance  In  all  of  the  categories  on 
the  Army-wide  BARS  Instrument;  a  single  rating  of  NCO  (noncommissioned 
officer]  Potential,  which  might  well  be  Independent  of  the  Army-wide  and 
Overall  Effectiveness  ratings;  and  a  single  rating  scale  of  Overall 
Performance  On  Specific  Job  Duties. 

Another  area  which  was  explored  concerned  records  of  administrative 
actions,  which  the  Army  routinely  maintains  for  all  enlisted  personnel. 

Most  of  the  Information  Is  maintained  In  noncomputerized  files  at  the 
soldier's  unit  of  assignment;  some  Is  also  forwarded  to  central  computerized 
files.  Because  obtaining  the  Information  from  noncomputerized  files  was 
excessively  labor-intensive,  we  developed  a  self-report  form,  which  asked 
for  Information  In  five  areas.  Including  awards,  letters  of  commendation, 
and  disciplinary  actions;  these  seemed,  on  the  basis  of  their  base  rates  and 
Judged  relevance,  to  have  at  least  some  potential  for  service  as  criterion 
measures.  This  Personnel  File  Information  Form  was  also  to  be  administered 
to  soldiers  In  all  19  Jobs. 


233 


For  the  ten  Jobs  which  were  being  studied  less  Intensively,  we 
developed  a  set  of  rating  scales  covering  performance  on  13  Common  Task 
dimensions,  the  Common  Tasks  being  those  which  are  required  of  all  soldiers, 
regardless  of  their  jobs.  The  13  dimensions  Include  such  things  as  basic 
first  aid  and  firing  of  Individual  weapons  (see  Figure  6).  These  were  the 
only  source  of  job-  or  task-specific  Information  which  would  be  obtained 
from  soldiers  In  the  ten  jobs. 

For  the  nine  jobs  which  were  subjected  to  Intensive  study,  sets  of 
rating  scales  covering  performance  on  the  15  tasks  tested  hands-on  were 
constructed;  although  these  were  not  behavioral ly  anchored  scales,  we  hoped 
that  they  would  provide  a  link  between  the  job-specific  BARS  and  the  hands- 
on  performance  results.  A  Job  History  Questionnaire,  requesting  Indication 
of  the  recency  and  frequency  of  performance  on  the  tasks  covered  by  the  job 
knowledge  and  hands-on  tests,  was  also  developed.  In  order  to  assess  the 
likely  Impact  of  experience  effects  on  task  performance.  (Figure  7) 

Army  Management  Agency  Reviews 

Throughout  the  Initial  development  cycle,  spanning  the  first  three 
years  of  the  project,  we  sought  and  received  extensive  Involvement  from  the 
Army  management  agencies  responsible  for  setting  training  and  job  perfor¬ 
mance  policy.  All  of  our  procedures  for  obtaining  Information,  all  of  our 
Instruments,  Items,  scales,  and  Instructions,  all  of  our  data  collection 
plans,  were  closely  monitored  by  personnel  from  these  agencies.  By  means  of 
a  series  of  formal  briefings  and  reviews  and  Informal  discussions,  we 
received  valuable  advice  and  direction  concerning  future  planning,  projec¬ 
tions,  and  policies.  Such  management  Involvement  has  been  and  will  continue 
to  be  Invaluable  In  maintaining  the  Integrity  of  the  criterion  development. 

Pilot  Testing  and  Field  Tryouts 

All  of  the  measures  went  through  several  Iterations  of  pilot  testing 
and  larger-scale  field  tryouts  before  they  were  finalized  for  the  Concurrent 
Validation.  The  pilot  tests  were  used  to  Insure  the  technical  accuracy, 
readability,  and  acceptability  of  the  measures.  The  field  tryouts  served  as 
a  dry-run  for  the  Concurrent  Validation.  They  Involved  testing  of  114  to 
178  soldiers  In  each  of  the  nine  Intensively  studied  jobs,  using  all  of  the 
Army-wide  and  job-specific  Instruments;  tryouts  of  the  training  achievement 
tests  among  soldiers  completing  job  training  provided  the  needed  Information 
for  the  other  ten  jobs.  Results  were  used  to  revise  and  refine  the 
Instruments.  The  training  achievement  tests,  job  knowledge  tests,  and 
expected  combat  performance  scales  were  reduced  In  order  to  be  administrable 
within  the  time  allotted  for  the  Concurrent  Validation,  a  small  number  of 
scales  were  dropped,  hands-on  tests  were  revised  to  Insure  reliable 
observation  and  scoring,  variables  were  added  to  the  personnel  Information 
form.  Instructions  were  refined. 


234 


The  final  array  of  the  criterion  measurement  instruments  is  portrayed 
in  Figure  8.  These  were  the  tests  and  scales  that  were  used  in  the 
Concurrent  Validation.  The  next  papers  will  describe  how  the  data  on  those 
Instruments  and  the  predictor  Instruments  were  collected  and  analyzed. 


This  research  was  funded  by  the  U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences,  Contract  No.  MDA903>82-C-0531.  All 
statements  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  express  the  official  opinions  or  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 

FIGURE  i 

ARMY-WIDE  RATING  SCALES  (BARS)  DIMENSIONS 

•  Technical  knowledge/skill 

•  Initiative/effort 


•  Self-control 

JOB-SPECIFIC  RATING  SCALES  (BARS)  DIMENSIONS  FOR  CANNON  CREWMEN 

•  Loa'ding  out  equipment 

•  Driving  and  maintaining  vehicles,  howitzers,  and  equipment 


•  Position  improvement 
FIGURE  2 

COMBAT  PERFORMANCE  PREDICTION  SCALE  DIMENSIONS 

•  Cohesion/commitment  to  others 

•  Self-discipline/responsibility 

•  Mission  orientation 

•  Technical/tactical  knowledge 

•  Initiative 

FIGURE  3 

TRAINING  ACHIEVEMENT  TEST  DUTY  AREAS  FOR  CANNON  CREWMAN 

•  Cannon  equipment  emplacement/displacement 
f  Firing  battery  operations  during  firing 


•  Communications  equipment  and  operator  maintenance 


235 


SENT  BY’-Xerox  Telecopier  7021  5  2-22-93  I11J06AM  ; 


2022740367-»7032749307 


2 


TASKS  COVERED  BY  JOB  KN0WI.E06E  AND  KANOS-ON  JOB  SAMPLE  TESTS 
FOR  CANNON  CREWMAN 

$  Perform  cardiopulmonary  raauicitatlon  * 

•  Pravant  shock 


a  DlsassambU/ assemble  breach  (M109;  MllO;  M198i  M102)  * 

*  HandS'On  test  developed  (all  tasks  covered  by  Job  knowledge  tests). 

PiauR^  5 

ADDITIONAL  MEASURES  FOR  SOLDIERS  IN  ALL  19  JOBS 

a  single  scale  rating  of  Overall  Effectiveness 
a  Single  scale  rating  of  NCO  Potential 

e  Single  scale  rating  of  Overall  Performance  on  Specific  Job  Duties 
e  Personnel  File  Information  Form  -  Variables: 

Number  of  awards  and  decorations 

Number  of  letters/cert If Icates  of  appreciation!  commendation* 
achievement 

NuMfber  of  Articles  15/ Flag  actions  (Disciplinary  actions) 

Number  of  Military  Training  Courses 


TI30IIF7 - 

ADDITIONAL  MEASURES  FOR  SOLDIERS  IN  TEN  JOBS 

e  Rating  scales  on  Common  Task  Areas: 

See:  Identifying  Threat  (armored  vehicles,  aircraft) 

See:  Estimating  Range 
• 
e 
e 

Survive:  Knowing  and  Applying  the  Customs  and  Laws  of  War 

PiBURE  7  ..  .. 

ADDITIONAL  MEASURES  FOR  SOLDIERS  IN  NINE  JOBS 

e  Task  Performance  Rating  Scales  >  on  the  15  tasks  tested  hands-on 

e  Job  History  Questionnaire  -  recency  and  frequency  of  performance 
on  30  tasks  In  job  knowledge  and/or  hands-on  tests 


236 


FEB  22  ’33  11:03 


2O2£740r^c-' 


FIGURE  8 


CRITERION  MEASUREMENT  INSTRUMENTS  FOR  CONCURRENT  VALIDATION 
Performance  Measures  For  All  19  Jobs: 

•  Army-Wide  Rating  Scales  (all  obtained  from  both  supervisors  and  peers). 

-  Ten  behavlorally  anchored  rating  scales  (BARS)  designed 
to  measure  factors  of  non- job-specific  performance. 

-  Single  scale  rating  of  Overall  Effectiveness. 

-  Single  scale  rating  of  NCO  Potential. 

e  Combat  Performance  Prediction  Scale  (obtained  from  both  supervisors  and 
peers)  containing  40  Items. 

e  Pa per- and- pencil  test  of  Training  Achievement,  developed  for  each  of  the 
19  Jobs  (130-210  Items  each). 

i  Personnel  File  Information  Form,  developed  to  gather  objective  archival 
records  data  (awards  and  letters,  rifle  marksmanship  scores,  physical 
training  scores,  etc.).  Self-report. 

Performance  Measures  'or  Nine  Jobs  Only: 

f  Job  Sample  (Hand. -On)  tests  of  job-specific  task  proficiency. 

-  Individual  Is  tested  on  each  of  about  15  major  job  tasks  In  a  job. 

e  Paper-and-pencll  Job  Knowledge  Tests  designed  to  measure  task-specific 
job  knowledge. 

-  Individual  Is  scored  on  150  to  200  multiple-choice  Items  representing 
about  30  major  job  tasks.  Ten  to  15  of  the  tasks  were  also  measured 
hands-on. 

a  Rating  scale  measures  of  specific  task  performance  on  the  tasks  measured 
with  the  hands-on  tests. 

•  Job-Specific  Rating  Scales  (obtained  from  both  supervisors  and  peers). 

-  From  6  to  12  behavlorally  anchored  rating  scales  (BARS),  developed  for 
each  job,  to  represent  the  major  factors  that  constitute  job-specific 
technical  and  task  proficiency. 

-  Single  scale  rating  of  Job  Performance 

e  A  Job  History  Questionnaire  which  asks  for  Information  about  frequency 
and  recency  of  performance  of  the  job-specific  tasks  (self-report). 

Performance  Measures  for  Ten  Jobs  Only: 

f  Army-Wide  Rating  Scal«r'.  (all  obtained  from  both  supervisors  and  peers). 

-  Ratings  of  performance  on  common  tasks  (e.g.,  basic  first  aid). 

-  Single  scale  rating  on  performance  of  specific  job  duties. 


237 


CREDITS 


Borman,  W.  C.,  Motowldio,  S.  J..  Rose,  S.  J.,  &  Hanser,  L.  H.  (October, 

1985)  "Development  of  a  Model  of  Soldier  Effectiveness,"  ARI  Technical 
Report. 

Campbell,  C,  H.,  Campbell,  R.  C.,  Rumsey,  H.  6. ,  &  Edwards,  D.  C.  (October, 
1985)  "Development  and  Field  Test  of  Task-Based  MOS-Specific  Criterion 
Measures."  ARI  Technical  Report  717. 

Campbell,  J.  P.  (Ed.)  (October,  1985)  "Improving  the  Selection, 
Classification,  and  Utilization  of  Army  Enlisted  Personnel:  Annual  Report, 
1985  Fiscal  Year."  ARI  Technical  Report. 

Campbell,  J.  P.,  &  Harris,  J.  H.  (August,  1985)  "Criterion  Reduction  and 
Combination  via  a  Participative  Decision-Making  Panel."  Paper  presented  at 
Annual  Convention  of  the  American  Psychological  Association,  Los  Angeles. 

Davis,  R.  H.,  Davis,  G.  A.,  Joyner,  J.  N.,  &  de  Vera,  M.  V.  (October,  1985) 
"Development  and  Field  Test  of  Job-Relevant  Knowledge  tests  for  Selected 
MOS."  ARI  Technical  Report. 

Pulakos,  E.  D.,  &  Borman,  W.  C.  (Eds.)  (October,  1985)  "Development  and 
Field  Test  of  Army-Wide  Rating  Scales  and  the  Rater  Orientation  and  Training 
Program."  'ARI  Technical  Report  716. 

Riegelhaupt,  B.  J. ,  Harris,  C.  D. ,  &  Sadacca,  R.  (October,  1985) 

"Development  of  Administrative  Measures  as  Indicators  of  Soldier 
Effectiveness,"  ARI  Technical  Report. 

Riegelhaupt,  B.  J. ,  &  Sadacca,  R.  (in  preparation)  "Development  of  Combat 
Prediction  Scales."  ARI  Technical  Report. 

Toquam,  J.  L.,  McHenry,  J.  J.,  Corpe,  V.  A.,  Rose,  S.  R.,  Lammlein,  S.  E., 
Kemery,  E.,  Borman,  W.  C.,  Mendel,  R. ,  &  Bosshardt,  M.  J.  (in  preparation) 
"Development  and  Field  Test  of  Behaviorally  Anchored  Rating  Scales  for  Nine 
MOS."  ARI  Technical  Report, 


238 


ANALYSIS  OF  CRITERION  MEASURES:  THE  MODELING  OF  PERFORMANCE 


John  P.  Ca^>be11 

Human  Resources  Research  Organization 

Jeffrey  J.  McHenry 
Lauress  L.  Wise 

American  Institutes  for  Research 


Presented  at  the  Annual  Conference  of  the 
-  Society  for  Industrial  and  Organizational  Psychology 

Atlanta,  Georgia 

April  1987 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


239 


The  paradigm  of  Project  A  is  simply  that  of  a  criterion  related  valid¬ 
ity  study,  albeit  a  very  large  one  that  examines  an  entire  system  at  once. 
Previous  papers  have  discussed  predictor  development,  criterion  development, 
and  data  editing  and  preparation.  This  paper  is  intended  to  illustrate 
further  the  usefulness  of  a  good  theory,  or  even  a  poor  one,  in  applied  re¬ 
search.  It  recounts  our  attempt  to  model  job  performance  in  this  population 
of  jobs  and  to  maximize  our  understanding  of  the  previously  described  cri¬ 
terion  measures.  Recall  that  multiple  methods  were  used  to  assess  indivi¬ 
duals  on  a  wide  array  of  performance  components.  Great  care  was  taken  in 
the  task  analysis  and  critical  incident  analysis  to  build  in  as  much  content 
validity  as  possible  and  considerable  resources  were  devoted  to  careful 
measurement  development. 

THE  INITIAL  FRAMEWORK 

The  overall  criterion  development  work  was  guided  by  a  particular  "theory" 
of  performance,  the  basic  outline  of  which  is  as  follows.  First,  job  perfor¬ 
mance  really  is  multi-dimensional.  There  is  not  one  outcome,  one  factor,  or 
one  anything  that  can  be  pointed  to  and  labeled  as  job  performance.  It  is 
manifested  by  a  wide  variety  of  behaviors,  or  things  people  do,  that  are 
judged  to  be  important  for  accomplishing  the  goals  of  the  organization. 

Two  General  Factors 

For  the  population  of  entry  level  enlisted  positions  we  postulated  that 
there  are  two  major  types  of  job  performance  components.  The  first  is  com¬ 
posed  of  components  that  are  specific  to  a  particular  job.  That  is,  mea¬ 
sures  of  such  components  would  reflect  specific  technical  competence  or  speci¬ 
fic  job  behaviors  that  are  not  required  for  other  jobs.  The  second  kind  of 
performance  factor  includes  components  that  are  defined  and  measured  in  the 


240 


same  way  for  every  job.  These  are  referred  to  as  Army-wide  criterion  factors. 

For  the  job  specific  components,  we  anticipated  that  there  would  be  a 
relatively  small  number  of  distinguishable  factors  of  technical  performance 
that  would  be  a  function  of  different  abilities  or  skills  and  which  would  be 
reflected  by  different  task  content. 

The  Army-wide  concept  incorporates  the  basic  notion  that  total  perfor¬ 
mance  is  much  more  than  task  or  technical  proficiency.  It  might  include 
such  things  as  contributions  to  teamwork,  continual  self-development,  support 
for  the  norms  and  customs  of  the  organization,  and  perseverance  in  the  face 
of  adversity. 

In  sum,  the  working  model  of  total  performance  with  which  the  project  be¬ 
gan  viewed  performance  as  mul ti-dimensional  within  the  two  broad  categories 
of  factors.  The  job  analysis  and  criterion  construction  methods  were  de¬ 
signed  to  "discover"  the  content  of  these  factors  via  an  exhaustive  descrip¬ 
tion  of  the* total  performance  domain,  several  iterations  of  data  collection, 
and  the  use  of  multiple  methods  for  identifying  basic  performance  factors. 

Factors  vs.  a  Composite 

Saying  that  performance  is  mul ti-dimensional  does  not  preclude  using 
just  one  index  of  an  individual's  contributions  to  make  a  specific  personnel 
decision  (e.g.,  select/not  select,  promote/not  promote).  As  argued  by 
Schmidt  and  Kaplan  (1971)  some  years  ago,  it  seems  quite  reasonable  for  the 
organization  to  scale  the  importance  of  each  major  performance  factor  re¬ 
lative  to  a  particular  personnel  decision  that  must  be  made  and  to  combine 
the  weighted  factor  scores  into  a  composite  that  represents  the  total  con¬ 
tribution  or  utility  of  an  individual's  performance,  within  the  context  of 
that  decision.  That  is,  the  way  in  which  performance  information  is  weighted 


241 


and  combined  is  a  value  judgment  on  the  organization's  part.  The  determina¬ 
tion  of  the  specific  combinational  rules  (e.g.,  simple  sum,  weighted  sum, 
non-linear  combination)  that  best  reflect  what  the  organization  is  trying 
to  accomplish  is  a  matter  for  research. 

Needed:  The  Latent  Structure  of  Perfomance 

If  all  the  rating  scales  are  used  separately,  the  HOS-specific  measures 
are  aggregated  at  the  task  or  instructional  module  level,  and  the  major  pre¬ 
dictor  subscales  are  used,  there  are  approximately  200  criterion  scores  on 
each  individual,  which  is  too  many  to  handle.  Adding  them  all  up  into  a 
composite  is  a  bit  too  atheoretical  and  developing  a  reliable  and  homogeneous 
measure  of  the  general  factor  violates  the  basic  notion  that  performance  is 
multi-dimensional.  A  more  formal  way  to  model  performance  is  to  think  in 
terms  of  its  latent  structure,  postulate  what  that  might  be  and  then  resort 
to  a  confirmatory  analysis.  Unfortunately,  it  is  true  that  we  simply  know  a 
lot  more, about  predictor  constructs  than  we  do  about  job  performance  con¬ 
structs.  There  are  volumes  of  research  on  the  former,  and  almost  none  on  the 
latter.  For  personnel  psychologists  it  is  almost  second  nature  to  talk  about 
predictors  in  terms  of  theories  and  constructs.  However,  on  the  performance 
side,  the  textbooks  are  virtually  silent.  Only  a  few  people  have  even  raised 
the  issue  (e.g.,  Dunnette,  1963;  Wallace,  1965). 

Given  this  intial  disparity,  we  used  our  own  expert  judgment,  the  pre¬ 
vious  literature,  and  data  from  pilot  and  field  tests  to  formulate  a  target 
model.  In  the  field  tests,  the  various  versions  of  the  criterion  measures 
were  administered  to  100-150  people  from  each  of  9  MOS.  These  data  and  the 
development  work  leading  up  to  them  are  summarized  in  Campbell  (1985)  and 
Campbell  and  Harris  (1985).  A  picture  we  drew  at  the  time  is  shown  in  Figure 
1. 


242 


It  is  included  only  to  show  one  stage  in  the  almost  continuous  process 
of  bootstrapping  ourselves  toward  a  more  final  conceptual  description  of  the 
predictor/criterion  space.  The  target  model  was  then  subjected  to  what 
might  be  described  as  a  "quasi"  confirmatory  analysis  using  data  from  the 
concurrent  validation  sample.  The  purpose  was  to  consider  whether  a  single 
model  of  the  latent  structure  of  job  performance  would  fit  the  data  for  all 
nine  jobs.  It  is  the  results  from  these  analyses  that  we  report  here. 

PROCEDURE 

As  described  previously,  the  final  versions  of  the  criterion  measures 
were  administered  to  a  concurrent  validation  sample  of  400-600  people  in  each 
of  the  19  jobs  (MOS).  The  complete  array  of  performance  measures  is  shown  in 
Table  1 . 

The  distinction  between  the  Batch  A  (9  MOS)  and  Batch  Z  (10  MOS)  is  that 
not  all  criterion  measures  were  developed  for  each  job  in  Batch  Z.  Budget 
constraints  dictated  that  the  job-specific  measures  could  only  be  developed 
for  a  limited  number  of  jobs  (i.e.  Batch  A). 

Each  "hands-on"  test  consisted  of  a  number  of  critical  steps,  with  each 
step  scored  pass  or  fail.  The  number  of  steps  within  a  task  varied  widely 
from  a  half-dozen  up  to  a  maximum  of  62.  The  job  knowledge  test  consisted 
of  3  to  15  questions  on  each  of  a  sample  of  30  tasks  (including  the  15  also 
sampled  for  hands-on  testing).  The  school  knowledge  test  was  organized 
around  the  "plan  of  instruction"  in  advanced  individual  (technical)  training. 
Each  test  consisted  of  100  to  200  items.  The  rating  scales  that  were  admini¬ 
stered  included  10  Army-wide  (i.e.  the  scales  were  the  same  for  all  jobs) 
behaviorally  anchored  scales,  from  8  to  13  job-specific  behaviorally  anchored 


243 


scales,  ratings  of  performance  on  each  of  the  15  tasks  tested  hands-on,  and 
a  40-iteni  combat  performance  prediction  questionnaire.  Overall  ratings  of 
general  effectiveness  as  a  soldier  potential  for  being  an  effective  NCO  were 
also  obtained. 

The  performance  indicators  contained  in  official  personnel  records  but 
obtained  chiefly  via  self-report  questionnaire,  included  such  indicators  as 
number  of  letters  and  certificates  received,  physical  readiness  test  score. 
Articles  15  and  other  disciplinary  actions,  and  M16  qualification  level. 

File  data  were  also  used  to  construct  a  promotion  rate  score  (relative  to 
expected  rate  for  a  given  length  of  service).  The  administrative  measures 
were  grouped  into  five  scales  on  the  basis  of  content;  no  attempts  were 
made  to  further  reduce  these  scales  at  this  point. 

RESULTS 

The  analysis  had  four  major  steps: 

1.  Determining  a  basic  array  of  criterion  scores  that  would 
constitute  the  input  to  the  confirmatory  analysis.  In  their 
unaggregated  form,  there  were  simply  too  many  variables  to 
theorize  about. 

2.  Specification  of  a  theory,  or  target  matrix,  that  could  be 
subjected  to  LISREL. 

3.  Determination  of  how  much  modification  is  necessary  to  fit 
the  data  adequately  for  each  job. 

4.  Examining  the  fit  of  an  overall  model  across  all  MDS. 

Reduction  of  the  Hands-On  and  Written  Test  Variables 

Initial  analyses  indicated  that  individual  task  scores  from  the  hands- 


244 


on  and  written  job  knowledge  tests  had  only  moderate  internal  consistency. 
Consequently,  tasks  were  grouped  by  6  research  staff  members  into  “func¬ 
tional  or  content  categories"  on  the  basis  of  similarity  of  task  content. 

The  30  tasks  sampled  for  each  job  were  clustered  into  8  to  15  categories. 

Each  of  the  school  knowledge  items  was  similarly  grouped  into  a  specific 
content  category. 

Ten  of  the  categories  were  common  to  some  or  all  of  the  jobs  (e.g., 
first  aid,  basic  weapon,  field  techniques).  Each  job,  except  Infantryman, 
also  had  two  to  five  performance  categories  that  were  unique  or  job  specific. 
Figure  2  shows  both  the  common  and  job  specific  item  categories  used  for 
each  of  the  nine  jobs.  Figure  3  includes  a  sample  of  the  definitions  that 
were  generated  for  each  content  category. 

Next,  scores  were  computed  for  each  content  category  within  each  of  the 
three  sets  of  measures.  For  the  hands-on  test,  the  functional  category  score 
was  the  mean  percent  of  successfully  completed  steps  across  all  of  the  tasks 
assigned  to  that  category.  For  the  job  knowledge  test  and  the  school  know¬ 
ledge  test,  the  functional  category  score  was  the  percent  of  items  within  that 
category  that  were  answered  correctly. 

After  category  scores  were  computed,  they  were  factor  analyzed  via 
principal  components.  Separate  factor  analyses  were  executed  for  each  type 
of  measure  within  each  job.  There  were  several  common  features  in  the  re¬ 
sults.  First,  the  unique  or  specific  categories  for  each  job  tended  to  load 
on  different  factors  than  the  common  categories.  Second,  the  factors  that 
emerged  from  the  common  categories  tended  to  be  fairly  similar  across  the 
nine  different  jobs  and  across  the  three  methods.  Some  of  the  categories 


245 


were  not  sampled  in  one  or  more  of  the  tests  for  some  jobs,  so  some  differ¬ 
ences  were  inevitable. 

Using  these  exploratory  empirical  factor  analyses  as  a  guide,  the  follow¬ 
ing  set  of  content  categories  was  identified. 

1.  Basic  Soldiering  Skills  (field  techniques,  weapons,  navi¬ 
gate,  customs  and  laws). 

2.  Safety/Survival  (first  aid,  nuclear-biological-chemical 
safety). 

3.  Communications  (radio  operation). 

4.  Vehicle  Maintenance. 

5.  Identify  Friendly /Enemy  Aircraft  and  Vehicles. 

6.  Technical  Skills  (specific  to  the  job). 

At  this  point,  the  categories  reflected  an  integration  of  expert  judg¬ 
ment  and  the  results  of  the  factor  analyses. 

Reduction  of  the  Rating  Variables 

As  noted  in  a  previous  paper  (Campbell,  1986  )i  The  individual 

ratings  scales  were  reasonably  reliable;  however,  the  different  scales  ex¬ 
hibited  intercorrelations  varying  from  moderate  to  high.  Further  reduction 
in  the  number  of  scales  was  aimed  at  reducing  redundancy  and  colinearity. 

As  also  noted  in  a  previous  paper  (Campbell,  1986  ),  empirical 

factor  analyses  of  the  Army-wide  rating  scales  suggested  three  factors.  These 
were: 

1.  Effort/Leadership;  including  effort  and  competence  in 
performing  job  tasks;  leadership;  and  self-development. 

2.  Maintaining  Personal  Discipline:  including  self-control; 
integrity;  and  following  regulations. 


246 


3.  Fitness  and  Appearance:  including  physical  fitness  and 
Maintaining  proper  military  bearing  and  appearance. 

Similar  exploratory  factor  analyses  were  conducted  for  the  job-specific 
BARS  scales  and  two  factors  within  each  job  were  identified.  The  first  con¬ 
sisted  of  scales  reflecting  performance  that  seemed  to  be  most  central  to 
the  specific  technical  content  of  each  job.  The  second  factor  included  the 
rating  scales  that  seemed  to  reflect  more  tangential  or  less  central  per¬ 
formance  components.  Again  the  final  formulation  of  factors  was  based  on  a 
combination  of  empirical  and  judgmental  considerations. 

The  reliability,  intercorrelations,  and  distributional  properties  of 
the  task  specific  for  each  of  the  30  tasks  also  tested  with  the  knowledge 
tests  were  also  examined.  In  general,  these  scales  were  less  reliable  than 
either  the  Army-wide  or  the  job-specific  behavioral  summary  scales.  Super¬ 
visors  and  peers  often  reported  that  they  had  never  had  an  opportunity  to 
observe  their  ratees'  performance  on  many  of  the  tasks,  leading  to  a  signi¬ 
ficant  missing  data  problem.  Consequently,  the  task  ratings  were  dropped 
from  the  present  analyses. 

The  individual  items  in  the  combat  performance  prediction  battery  also 
were  subjected  to  a  principal  components  analysis.  Two  factors  seemed  to  emerge 
from  an  analysis  on  the  combined  sample.  The  first  factor  consisted  of  items 
depicting  exemplary  effort,  skill,  or  dependability  under  stressful  conditions. 
The  second  factor  consisted  of  items  portraying  failure  to  follow  instruc¬ 
tions  and  lack  of  discipline  under  stressful  conditions. 

The  Final  Array 

Based  on  the  above  exploratory  analyses,  the  reduced  array  of  criterion 
variables  for  each  job  consisted  of: 


247 


•  2-5  hands-on  content  category  scores 

•  2-6  job  knowledge  content  category  scores 

•  2-6  school  knowledge  content  category  scores 

•  3  Army-wide  rating  factors 

•  2  job-specific  rating  factors 

•  2  combat  performance  prediction  rating  factors 

•  1  overall  effectiveness  rating 

•  5  administrative  measures  scale  scores 

Tables  2  through  lo  show  the  means,  standard  deviations,  and  intercorre 
lations  among  these  variables  for  each  of  the  nine  jobs. 


Building  the  Target  Model 

The  next  step  was  to  build  a  target  model  of  job  performance  that  could 
be  tested  for  goodness  of  fit  within  each  of  the  nine  jobs.  The  initial 
model  shown  in  Figure  1  was  a  starting  point.  The  correlation  matrices  shown 
in  Tables  2  through  10  were  each  subjected  to  another  round  of  empirical  fac¬ 
tor  analysis  to  suggest  possible  modifications. 

Several  consistent  results  were  observed  in  the  different  factor  analy¬ 
ses.  First,  as  expected,  there  was  the  general  prominence  of  "method"  factors, 
specifically  one  methods  factor  for  the  ratings  and  one  methods  factor  for 
the  written  tests.  The  emergence  of  method  factors  was  anticipated  and  was 
consistent  with  prior  findings  (e.g.,  Landy  and  Farr,  1980). 

The  second  consistent  result  was  a  correspondence  between  the  admini¬ 
strative  measures  scales  and  the  three  Army-wide  rating  factors.  The  awards 


24C 


and  certificates  scale  from  the  administrative  measures  loaded  together  with 
the  Army-wide  effort/leadership  rating  factor;  the  Article  15  and  promotion 
rate  scale  loaded  with  the  personal  discipline  factor  (most  of  the  variance 
in  promotion  rate  was  thought  to  be  due  to  retarded  advancement  associated 
with  disciplinary  problems);  and  the  physical  readiness  scale  loaded  with 
the  fitness /appearance  factor. 

A  third  observation  from  the  empirical  factor  analyses  was  that,  with 
the  possible  exception  of  the  job  specific  content  factors,  there  was  not 
much  evidence  that  the  factors  reflecting  task  performance  crossed  measure¬ 
ment  methods.  The  hands-on  communication  score,  for  example,  was  likely  to 
be  as  correlated  with  the  written  safety  score  as  with  the  written  communi¬ 
cation  score.  This  result  was  taken  as  evidence  against  being  able  to  se¬ 
parate  measurement  of  task  knowledge  versus  task  performance  skill  within 
the  common  task  domain. 

Based  on  these  findings  from  the  exploratory  empirical  analyses,  a  re¬ 
vised  model  was  constructed  to  account  for  the  correlations  among  our  perfor¬ 
mance  measures.  This  model  included  the  five  job  performance  constructs 
shown  in  Figure  4.  In  addition,  a  "paper- and- pencil  test"  methods  factor 
and  a  ratings  "method"  factor  were  retained. 


Several,  minor  issues  remained  before  the  model  could  be  tested  for  good¬ 
ness  of  fit  within  the  nine  Batch  A  jobs.  One  was  whether  the  job-specific 
BARS  rating  scales  were  measuring  job-specific  technical  knowledge  and  skill, 
or  effort  and  leadership,  or  both.  The  intercorrelations  among  our  performance 
factors  suggested  that  these  rating  scales  were  measuring  both  of  these  per¬ 
formance  constructs,  though  they  seemed  to  correlate  more  highly  with  other 


249 


measures  of  effort  and  leaderhsip  than  with  measures  of  job-specific  technical 
knowledge  and  skill. 

Another  issue  was  whether  it  was  necessary  to  posit  hands-on  and  admini¬ 
strative  measures  "method"  factors  to  account  for  the  intercorrelations  within 
each  of  these  sets  of  measures.  The  average  intercorrelation  among  the  scores 
within  each  of  these  sets  was  not  particularly  high.  Therefore,  for  the  sake 
of  parsimony,  we  decided  to  try  to  fit  a  model  without  these  two  additional 
methods  factors. 

Confinaing  the  Hodel  Ulthin  Each  ilob 

The  next  step  in  the  analysis  was  to  conduct  separate  tests  of  goodness 
of  fit  of  this  target  model  within  each  of  the  nine  jobs.  This  was  done 
using  the  LISREL  confirmatory  factor  analysis  program  (Joreskog  &  Sorbom,  1981). 

In  conducting  a  confirmatory  factor  analysis  with  LISREL,  it  is  ne¬ 
cessary  to  sjpecify  the  structure  of  three  different  parameter  matrices: 

Lambda-Y,  the  hypothesized  factor  structure  matrix  (a  matrix  of  regression 
coefficients  for  predicting  the  observed  variables  from  the  underlying  latent 
constructs);  Theta-Epsilon,  the  matrix  of  uniqueness  or  error  components 
(and  intercorrelations);  and  Psi,  the  matrix  of  covariances  among  the  factors. 

In  these  analyses,  we  set  the  diagonal  elements  of  Psi  (i.e.  the  factor 
variances)  to  one,  forcing  a  "standardized"  solution.  This  meant  that  the 
off-diagonal  elements  in  Psi  would  represent  the  correlations  among  and  be¬ 
tween  our  performance  constructs  and  method  factors.  We  further  specified 
that  the  correlation  among  the  two  method  factors  and  each  performance  con¬ 
struct  should  be  zero.  This  effectively  defined  the  method  factor  as  that 
portion  of  the  common  variance  among  measures  from  the  same  method  that  was 
not  predictable  from  (i.e.  correlated  with)  any  of  the  other  related  factor 
or  performance  construct  scores. 


250 


Some  problems  were  encountered  in  fitting  the  hypothesized  model  for 
several  of  the  jobs.  Solutions  were  obtained  with  some  factor  loadings 
greater  than  one  and  with  negative  uniqueness  estimates  for  the  correspond¬ 
ing  observed  variables.  Also,  estimates  of  the  correlations  among  the  per¬ 
formance  constructs  occasionally  exceeded  unity.  These  problems  necessitated 
a  certain  amount  of  ^  hoc  cutting  and  fitting  in  the  form  of  computing  the 
squared  multiple  correlation  (SMC)  for  predicting  each  observed  variable 
from  all  of  the  other  variables,  and  setting  the  uniqueness  estimates  (i.e. 
Theta-Epsilon  diagonal)  to  one  minus  this  SMC.  This  approach  eliminated 
all  factor  loadings  and  correlations  greater  than  one.  In  most  cases,  a 
second  “iteration"  was  performed  to  adjust  the  initial  Theta-Epsilon  estimates 
so  that  the  diagonal  of  the  estimated  correlation  matrix  would  be  as  close 
to  one  as  possible. 

Table  n  shows  the  final  factor  loading  estimates  from  Lambda-Y  for  each 
job.  Tables  12  and  13  show  the  uniqueness  estimates  from  Theta-Epsilon  and 
the  factor  intercorrelation  estimates  from  Psi,  respectively. 

LISREL  also  computes  a  goodness-of-f i t  index  based  on  a  comparison  of 
the  actual  correlations  among  the  observed  variables  and  the  correlations 
estimated  from  Lambda-Y,  Theta-Epsilon,  and  Psi.  The  goodness  of  fit  is 
distributed  as  chi-square,  with  degrees  of  freedom  dependent  on  the  number  of 
observed  variables  and  the  number  of  parameters  estimated.  The  expected 
value  of  chi-square  is  equal  to  the  degrees  of  freedom,  it  is  a  sign  that 
the  model  does  not  fit  the  correlations  among  the  observed  variables. 

Table  14  shows  the  value  of  chi-square  for  each  job.  The^e  chi-s'^uare 
values  should  be  interpreted  with  considerable  caution.  The  approach  we 


251 


used  was  not  purely  confirmatory.  The  hypothesized  target  model  was  based 
in  part  on  analyses  of  these  same  data.  In  addition,  LISREL  was  "told" 
that  the  Theta-Epsilon  (uniqueness)  parameters  were  all  fixed,  and  therefore 
did  not  "use  up"  any  degrees  of  freedom  estimating  these  parameters;  in 
fact,  these  values  were  estimated  entirely  from  the  data. 


Confirmation  of  the  Overall  Hodel 

The  results  of  the  above  procedures  applied  to  each  job  generally  sup¬ 
ported  a  common  structure  for  job  performance.  The  procedures  also  yielded 
reasonably  similar  estimates  of  the  intercorrelations  among  the  constructs 
and  of  the  loadings  of  the  observed  variables  on  these  constructs  across  the 
nine  jobs. 

The  results  of  the  confirmatory  procedures  applied  to  the  performance 
measures  from  each  job  generally  supported  a  common  structure  of  job  perfor¬ 
mance.  The  procedures  also  yielded  reasonably  similar  estimates  of  the  in¬ 
tercorrelations  among  the  constructs  and  of  the  loadings  of  the  observed 
variables  on  these  constructs  across  the  nine  jobs. 

The  final  step  was  to  determine  whether  the  variation  in  some  of  these 
parameters  across  jobs  could  be  attributed  to  sampling  variation.  The 
specific  model  that  we  explored  stated  that  (1)  the  correal tion  among  factors 
was  invariant  across  jobs  and  (2)  the  loadings  of  all  of  the  Army-wide  mea¬ 
sures  on  the  performance  constructs  and  on  the  rating  method  factor  were  also 
constant  across  jobs. 

The  proposed  overall  model  was  a  relatively  stringent  test  of  a  common 
latent  structure.  For  example,  it  was  quite  possible  that  selectivity  dif- 


252 


ferences  in  the  different  jobs  would  lead  to  differences  in  the  apparent 
measurement  precision  of  the  common  instruments  or  differences  in  the  corre¬ 
lations  between  the  constructs.  This  would  tend  to  make  it  appear  that  the 
different  jobs  required  different  performance  models,  when  in  fact  they  do 
not. 

The  LISREL  multi -groups  option  requires  that  the  number  of  observed 
variables  be  the  same  for  each  job.  However,  virtually  every  job  was  missing 
scores  on  at  least  one  of  the  five  construct  categories  for  at  least  one  of 
the  three  knowledge  and  skill  measurement  methods.  To  handle  this  problem, 
the  Theta-Epsilon  error  estimates  for  these  variables  were  set  at  1.00,  and 
the  observed  correlations  between  these  variables  and  all  the  other  variables 
were  set  to  zero.  It  was  thus  necessary  to  count  the  number  of  "observed" 

correlations  that  we  generated  in  this  manner  and  subtract  this  number  from 
the  degrees  of  freedom  when  determining  the  significance  of  the  chi-square 
goodness-of-fit  statistic. 

The  overall  model  fit  extremely  well.  The  root  mean  square  residual 
was  .047,  and  the  chi-square  was  2508.1.  There  were  2403  degrees  of  freedom 
after  adjusting  for  missing  variables  and  the  use  of  the  data  in  estimating 
uniquenesses.  This  yields  a  significance  level  of  .07,  not  enough  to  reject 
the  model.  Tables  15  and  16  show  the  factor  loadings  and  uniqueness  for  each 
job  under  this  constrained  model.  Table  17  shows  the  final  mapping  of  the 
criterion  measures  on  the  five  performance  factors. 

Criterion  Intercorrelations 

Five  residual  scores  were  then  created  from  the  five  criterion  factors 


253 


in  the  following  manner.  A  paper-and-pencil  "methods"  factor  score  was  created 
by  first  summing  the  two  paper-and-pencil  knowledge  tests  (job  knowledge  and 
training  content  knowledge  scores)  and  then  parti aling  out  the  variance  due 
to  the  correlation  of  the  total  paper-and-pencil  test  score  with  all  non  paper- 
and-pencil  criterion  measures  (e.g.,  hands-on  scores,  rating  scores,  and  ad¬ 
ministrative  records  scores).  This  residual  was  defined  as  the  paper-and-pencil 
method  score.  This  variable  was  in  turn  partialed  from  the  Core  Technical 
Proficiency  criterion  factor  and  from  the  General  Task  Proficiency  factor 
creating  two  residual  scores.  A  similar  procedure  was  used  to  create  a 
rating  method  factor  score  which  was  in  turn  partialed  from  the  Effort/Lea¬ 
dership,  Personal  Discipline,  and  Physical  Fitness/Mil itary  Bearing  factors, 
thereby  creating  three  more  residual  scores. 

The  five  criterion  factor  scores,  the  five  residual  criterion  scores, 
the  single  rating  obtained  from  the  overall  performance  rating  scales,  and 
the  total  score  from  the  hands-on  test  were  used  to  generate  a  12  x  12  matrix 
of  criterion  intercorrelation  for  each  MOS  in  Batch  A.  The  averages  of  these 
correlations  across  MOS  are  shown  in  Table  18. 

Remember  that  to  create  the  residual  scores  the  paper-and-pencil  factor 
was  partialed  from  the  first  two  criterion  factors  and  the  rating  method 
factor  was  partialed  from  the  last  three  criterion  factors.  The  intercorre¬ 
lations  of  the  5  criterion  factors  are  in  the  upper  left  quadrant,  the  inter- 
correlations  among  the  5  residual  scores  are  in  the  lower  right  quadrant, 
and  the  cross  correlations  are  in  the  upper  right  and  lower  left.  Also  remem¬ 
ber  that  the  first  two  factors  contain  items  from  both  the  knowledge  tests 
and  hands-on  tests  and  the  last  three  factors  all  contain  both  ratings  and 


254 


administrative  measures. 

Some  noteworthy  features  of  this  12  x  12  matrix  are  the  following. 

•  The  intercorrelations  of  the  factor  pairs  which  confound  mea¬ 
surement  method  (e.g.,  1  with  2  or  3  with  4)  are  higher,  as 
expected,  than  factor  pairs  which  do  not  confound  method  (e.g, 

1  with  3  or  2  with  4).  However,  they  are  not  so  high  that 
collapsing  the  five  factors  into  some  smaller  number  would  be 
justified.  In  fact,  as  illustrated  later  (McHenry  ), 

factors  1  and  2,  which  intercorrelate  .531  on  the  average, 
yield  different  profiles  of  correlations  with  the  selection 
tests. 

•  The  correlation  of  the  overall  performance  rating  scale  with 
the  total  hands-on  test  score  is  low  (.203)  but  it  is  cer¬ 
tainly  not  zero.  Assuming  a  reliability  of  about 

.60  for  each  measure  would  yield  an  intercorrelation  of 
about  .34  when  corrected  for  attenuation.  Consequently, 
there  is  a  substantial  proportion  of  common  variance  be¬ 
tween  the  two  measures  but  by  no  means  do  they  assess  the 
same  things.  Assuming  for  the  moment  that  the  reliable 
variance  in  each  measure  is  relevant  to  performance,  a  rea¬ 
sonable  conclusion  is  that  while  performance  on  a  standardized 
job  sample  is  a  significant  component  of  performance  it  is 
by  no  means  all  of  it. 

•  The  correlations  of  the  residualized  factor  3  (effort/lea¬ 
dership  residual)  with  the  core  technical  factor,  the  re- 


255 


sidual  core  technical  factor,  the  general  task  proficiency 
factor,  the  overall  rating  scale,  and  the  hands-on  total 
score  are  all  about  the  same.  Also,  as  compared  to  the  corre¬ 
lation  of  the  effort/leadership  raw  scores  with  these  same 
variables,  the  correlations  of  effort/leadership  residual 
with  the  core  technical  and  general  task  proficiency  factors 
go  up  while  the  correlations  with  personal  discipline  and 
physical  fitness  go  down.  Residual izing  factor  three  (by 
removing  the  rating  method  factor)  makes  it  more  like  a 
"can  do"  factor  and  less  like  a  "will  do"  factor. 

In  general,  these  intercorrelations  seem  to  behave  in  very  lawful  ways 
and  are  consistent  with  a  mul ti -dimensional  model  of  performance. 

SUtfARY  AND  DISCUSSION 

Several  aspects  of  the  final  structure  are  noteworthy.  First,  in  spite 
of  some  confounding  factor  content  with  measurement  method,  the  latent  per¬ 
formance  structure  appears  to  be  composed  of  very  distinct  components.  It 
is  reaonsable  to  expect  that  the  different  performance  constructs  would  be 
predicted  by  different  things,  so  that  validity  generalization  may  not  exist 
across  the  performance  constructs  within  a  job.  If  this  is  so,  there  is  a 
genuine  question  of  how  the  performance  constructs  should  be  weighted  in 
forming  an  overall  appraisal  of  performance  for  use  in  personnel  decisions. 

It  is  tempting  to  infer  that  Effort/Leadership  and  Maintaining  Personal 
Discipline,  particularly  the  latter,  reflect  aspects  of  performance  that  are 
under  motivational  control  and  consequently  may  be  better  predicted  by  per¬ 
sonality  or  interest  measures  than  by  measures  of  ability  or  skill.  This 


256 


leads  us  to  the  question  of  whether  choices  such  as  showing  up  on  time, 
staying  out  of  trouble,  and  expending  extra  effort  under  adverse  conditions 
are  a  function  of  state  or  trait  variables.  We  do  have  considerble  data  to 
focus  on  the  question.  It  is  also  interesting  that  the  residual  score  for 
factor  3  becomes  more  like  a  "can  do"  component  of  performance.  It  may  be 
the  case  that  raters  cannot  separate  can  do  from  will  do  when  they  are  asked 
to  retrospectively  aggregate  an  individual's  task  performance  and  provide  an 
evaluation  of  it.  If  the  degree  to  which  an  individual  exhibits  a  charac¬ 
teristic  .effort  level  and  consistency  of  performance  is  not  task  specific 
then  halo  might  indeed  be  substantive  variance  and  not  error. 

Given  the  high  degree  of  consistency  across  jobs  in  the  structure  of 
the  performance  measures,  it  is  worth  asking  to  what  extent  our  performance 
model  generalizes  to  even  wider  domains  of  jobs.  Some  limitations  appear 
likely.  The  "general  soldiering  skills"  constructs  would  almost  surely  be 
quite  different  outside  the  military.  Perhaps  it  would  be  replaced  by  a 
more  generalized  job  skill  construct.  Similarly,  it  is  likely  that  the  phy¬ 
sical  fitness  and  military  appearance  construct  also  would  be  somewhat  dif¬ 
ferent  for  civilian  occupations.  The  remaining  constructs  --technical  skill, 
effort  and  leadership,  and  personal  discipline—  all  appear  to  be  basic 
components  of  almost  any  job. 

In  generalizing  to  a  wider  domain  of  jobs,  it  is  reasonable  to  suppose 
that  other  latent  structures  would  fit  other  "populations"  of  jobs.  For 
example,  jobs  that  are  not  organized  into  units  and  that  involve  a  great  deal 
of  written  or  oral  communication  (e.g.,  sales  jobs)  might  have  a  different 
structure.  It  is  tempting  to  ask  how  many  different  performance  dimension 
structures  define  different  populations  of  jobs.  Such  questions  go  well  be¬ 
yond  the  present  finding,  however,  which  is  that  a  single  structure  di^  fit 


257 


the  jobs  studied. 

Since  (a)  the  five-factor  solution  Is  stable  across  jobs  sampled  from 
this  population,  (b)  the  performance  constructs  seek  to  make  sense,  and 
(c)  the  constructs  are  based  on  measures  carefully  developed  to  be  content 
valid,  it  seems  safe  to  ascribe  some  degree  of  construct  validity  to  them. 


258 


REFERENCES 


Campbell,  J.  P.  (1986).  When  the  textbook  goes  operational .  Paper  presented 
at  the  94th  Annual  Convention  of  the  American  Psychological  Association, 
Washington,  O.C. 

Campbell,  J.  P.,  &  Harris,  J.  H.  (1985).  Criterion  reduction  and  combination 
via  a  participation  decision-making  panel!  Paper  presented  at  the  93rd 
Annual  Meeting  of  the  American  Psychological  Association,  Los  Angeles. 

Dunnette,  M.  D.  (1963).  A  modified  model  for  selection  research.  Journal 
of  Applied  Psychology,  47,  317-323. 

Joreskog,  K.  C.,  &  Sorbom,  D.  (1981).  LISREL  VI:  Analysis  of  Linear  squares 
methods.  Uppsala,  Sweden:  University  of  Uppsala. 

Landy,  F.  J.,  4  Farr,  J.  L.  (1980).  Performance  rating.  Psychological  Bulle¬ 
tin,  87,  72-107. 

Schmidt,  F.  L.,  4  Kaplan,  L.  B.  (1971).  Composite  vs.  multiple  criteria:  A 
review  and  resolution  of  the  controversy.  Personnel  Psychology,  24,  419- 
434.  '  ~ 

Wallace,  S.  R.  (1965).  Criteria  for  what?  American  Psychologist,  20.  411- 
418.  ~ 

Wise,  L.  W.^  Campbell,  J.  P.,  Hanser,  L.  M. ,  4  McHenry,  J.  J.  (1986).  A 
latent  structure  model  of  job  performance  factors.  Paper  presented~at 
the  94th  Annual  Convention  oi^  the  American  Psychological  Association, 
Washington,  D.C. 


259 


"*0>«  1 


3um4ry  of  Crrttnan  '•cssures  'J$*d  in  ^oncyr-en* 
Validation  Sanolts^ 


££rf2£5snc£jJSSlii£SJ_C3jj5on_toJ|atc2_*_jyjd_Satc2_Jj|jO^  (Joos) 

1.  Tail  bahaviorally  ancfiortd  rating  scaitt  dasignad  to  naasura  'actors  o' 
na<fJod«spdclfie  parformaftca  (t.g..  giving  paar  laadarsnip  ana  support, 
•alncaining  aduipwitt  tplf  dlseipllna). 

2.  Singla  seala  rating  of  ovarall  Job  parfennanea. 

3.  Singla  aeala  rating  of  ICO  (noiv>coaartisionad  officar)  potantial. 

f apar-and^panei  1  Tast  of  Training  AeMava«iant  davalopad  for  aacn  of  tna 
19  MOS  (130-210  itans  aacn). 

5.  A  aO-lta«  suapiatad  rating  seala  for  tna  assaasMant  of  axpactad  combat 
parf ormanea . 

6.  Fiva  parformanea  indicators  from  admini strati va  rtcords.  Tha  first 
four  art  obtalnad  via  salf-rapert  and  tna  last  ona  from  computarlxad 
raeords. 

0  Total  nwmbar  of  amards  and  lattart  of  commandatlon. 

0  Fhysleal  fltnass  Qualification. 

0  ibimbar  of  disciplinary  infractions. 

0  Rlfla  martmansnip  Qualification  scora. 

0  Fromotlon  rata  (in  daviatlon  units). 

Farformanca  Maasuras  for  Baten  A  Only 

T.  Job-samp  la  (hands-on)  tHt  of  MO$-sptcifle  task  profldtncy. 

0  Individual  is  tastad  on  aach  of  IS  major  job  tasks. 

8.  Fapar-and-panel 1  job  knowladga  tasts  dasignad  to  maasura  task-spacific 
job  kn^ladga. 

0  Individual  Is  scorad  on  150-2X  multipla  choica  itams  raprasanting 
30  major  job  tasks.  Fiftaan  of  tha  tasks  itara  also  maasurad 
hands-on. 

9.  bating  seala  maasuras  of  spacifie  task  parformanca  on  tha  IS  tasks  also 
maasurad  with  tha  knowladga  tasts  and  tha  hands-on  maasuras. 

10.  M0S-spaelf1c  bahavloratly  anehorad  ratings  scalas.  From  7  to  13  BAPS 
wara  davalopad  for  aach  NOS  to  raprasant  tha  major  factors  that 
eonstitutod  job-spacific  technical  and  task  proficiency. 

Parformanca  Naasuraa  for  Batch  2  Only 

11.  Ratings  of  performance  on  13  raprasantativa  "eoaaaen*  tasks.  The  Army 
specifies  a  sanas  of  comaon  tasks  (t.g..  savaral  first  aid  tasks),  that 
avaryona  should  be  able  to  perform. 

Ajj*lJha2jjjjjj2Ill.il£liiflJS.J!L££lSllIl2!L2lSSS2L 

12.  Job  History  Questionnaire  which  asks  for  information  about  frequency  and 
recency  of  performance  of  the  NO$-specific  tasks. 

13.  Work  Cnvi remnant  Oascripdon  Ouastionnarra  -  a  lAl-ltam  ouastionnatra 
assessing  situational/anvironmantal  characteristics,  leadership  climate, 
and  rward  preferences. 


Ull  rating  measures  were  obtained  from  approximately  2  supervisors  and  3 
poors  for  each  ratae. 


250 


Table  2 


JOB  FBSrOBlIBlICl  MEJ^OBS  BUIliaBT  BTXTZ8TXCS 
FOR  llBt  ZRFRBTRY 

*  MRIMLE  m  S8  I  2  3  4  S  4  7  I  f  10  11  12  13  14  IS  14  17  tl  19  a  21  22  a  = 


1  Ovtrall  HitiM  4.40  0.90  .  90  74  48  77  3S  73  4S  S  12  17-S  34  24  14  4  33  3  11  10  S  19  18  12  14 

2  Eff/Ldr  aitiii*  4.41  0.82  90  .  74  43  80  88  80  47  24  8  13-a  34X12  3X27  10  13  333  M? 

3  OiieioUit  itH  4.3  O.r  74  74  .  49  3  71  43  44  13  3  7-3  3  14  10  3  3  S  4  8  24  13  13  S  IS 

4  FitMM  latiaa  4.14  0.89  48  45  49  .  39  44  32  43  17  27  9>24  22  10  9  -1  10  10  -2  -4  13  4  4  1  1 

3  7m4  S.98  4.3I77M3S9  .847331313  17-3  S271S  33312  10X213  9  14 

4  JtO-Sotc  atiwr  22.47  3.X  B  ■  71  M  X  .  80  47  3  8  14-3  S  3  10  4  33  X  12  12  3  17  3  11  17 

7  CMiat  Enalrr  9.02  1.49  3  80  43  3  73  80  .  3  24  8  13-X  3  3  12  7  3  3  9  14  X  3  3  9  19 

9  CMiat  FrttlMi  10.03  1.44  43  47  M  43  3  47  73  .  14  8  4-3  3  3  7  -1  X  24  9  13  31  3  18  8  14 

9  AMNt  4  Ctrti  3.3  2.18  3  24  13  17  3  3  24  14  .  13  3  -2  4  13  4  -1  14  13  -0  13  9  9  5  4  12 

10  OtiOimi  23.44  3.00  12  8  3313  8  8  8  13  .  11  2-4  1  -7  -9  0  3  -7  -0  8  -2  -1  -4-a 

11  M4  8uliFic.  2.74  0.3  17  3  7  9  1?  14  3  4  3  11  .  1  1  3  4  -0  10  2  3  0  14  10  3  3  4 

12  OrtxeiM  3  0.3  0.85-35-30-39-24-20-21-31-3  -2  2  I  .-45-10  -1  -4-10  -9  -€  -4-10  -1-9  0  -S 

13  Oatt  0.3  0.48  X  X  31  S  S  S  3  3  4  -4  1-43  .  14  7  7  19  17  12  10  18  !4  12  11  17 

14  «  Bifie  _  3.3  10.M  X  3  14  10  3  3  3  3  3  1  13-10  14  .3  4  44  3  3  3  40  24  3  14  ^ 

15  HO  SaFtti  2.47  3.41  14  12  10  9  3  10  3  7  4  -7  4  -1  7  13  .  2  14  I  1  3  14  7  3  3  4 

to  HQ  CaM  3.3  1.33  433  -1  547  -I -I -9-0-4742  .  44  -1  -3  0442-1 

1?  JK  Xue  50.73  9.7l  35  X  3  10  3  33  3  X  14  0  10-10  19  44  14  4  .  48  40  42  45  3  40  3  2 

13  JS  XfttT  3.02  4.31  Snr  10  2X2  24  18  5  2-9  17  3  8  448  .2X  47  41  22  23 

19  jl  Cam  4.37  1.47  11  10  4  -2  12  3  9  9  -0  -7  3  -8  12  3  I  -1  40  2  .  14  X  2  19  14  IS 

3  a  lantxfr  8.2  2.24  10  3  8  -4  10  12  14  15  3  -0  0  -4  10  2  8  -3  42  X  14  .  31  24  18  14  2 

21  9  Xt;e  2.2  14.89  2  2  24  3  X  2  X  31  9  8  14-10  18  40  14  0  43  47  X  31  .  3  40  44  43 

2  S  Safttr  9.51  2.3  19  3  3  4  2  17  2  2  9  -2  10  -1  14  24  7  4  3  41  2  24  3  .  43  34  X 

2  SI  Caa  5.48  1.47  18  3  3  4  2  2  2  18  5  -I  5  -9  3  3  3  4  40  2  19  18  40  45  .  40  31 

24  SI  Utiteit  0.2  3.42  3  9  5  1  9  11  9  8  4  -4  3  0  11  14  3  2  3  2  14  14  44  54  ao  .21 

2  9  UoitifT  2.3  1.14  14  17  3  I  14  17  19  14  12-4  b  -5  17  3  4  -i  2  3  IS  37  43  X  21  Zl  . 

N*  503 


261 


Table  3 


JOB  FIBBOBMBIICB  KBBSXJIS  SUXIIABT  BTATXBTZCS 
FOB  13B:  CBMMOB  CBEBHBH 

#  VMIAB£  «  9  l  2  3  4  S47lfieui2l3l4  1SU17ilt9  20  :2S3  24SU7 


:  Gftrtll  tatiaf  4.99  0.79  .  14  71  41  42  72  73  41  11  10  9-2S  30  20  19  17  4  24  II  14  I  3  24  IS  12  If 

:  Eff/I4r  lit:M  4.43  0.74  14  .  ?S  42  49  74  71  41  14  4  1-23  2S  7  29  14  9  32  20  19  11  9  30  20  IS  S  13 

3  3iie:9iiat  Itna  4.41  0.78  71  79  .  91  93  40  43  40  *<1  -4  -i-20  24  12  f  12  4  22  14  19  4  3  18  14  14  a  14 

i  Fitmi  latiBi  4.99  0.12  41  42  91  .  47  S3  Si  39  7  23  -l-S  14  I  4  0  3  S  *1  >1  1-2  4  -4  •<  -4  -9 

9  Ja4-Sm  Tick  23.39  3.SS  42  49  S3  47  .  80  40  39  11  10  1  -2  10  39  II  9  -1  29  10  10  17  I  24  I  12  4  4 

4  Jkk-SMC  Otitr  23.90  3.01  72  74  40  S3  80  .  44  49  4  9  -4  -9  11  29  If  I  1  29  11  19  13  4  24  14  14  4  8 

7  Caaiat  Euairr  9.00  1.44  73  71  43  91  40  44  .  43  14  10  3*19  23  20  23  13  3  22  14  13  4  I  3  12  7-1  l 

3  Ctatat  frakim  9.92  1.94  41  41  40  39  39  49  43  .  I  7  -3-14  24  14  14  a  12  19  17  10  14  I  19  14  9  9  3 

9  Aiiiras  I  Cana  2.91  1 J2  11  14  -O  7  11  4  14  I  .  12  II  0  I  IS  19  19  -1  U  10  4  9  I  11  4  9  8  2 

10  kkya.  laikiatat  241.74  32.70  10  4  *4  23  10  9  10  7  12  .  11  *3  -2  7  2  -7  8  -I  -0-10  9  4  -0  -f-lO-lMS 

11  «4  Saalifik.  2.29  0.49  9  1  -1  -I  1-4  3  -3  18  11  .  4  1  7  I  12  -3  -4  -5  -4  7  -3  -3  -7  0  3  -3 

12  Anieiia  IS  0.44  1.03-25-23-20-B  -2  -9-15-14  0  -3  4  .-31  -0  -4  -9  -5  -7-10-12  -7  i  -9  -4  -2  -5  -3 

13  frtmun  lata  0.01  0.43  30  29  24  14  10  If  3  24  I  -2  1-31  .  4  10  10  3  10  4  5  9  -1  2  9  -2  10  7 

14  HO  Ttak.  90.71  9.94  20  3  12  8  S  B  20  14  19  7  7  -0  4  .  47  20  II  3  13  7  10  12  34  If  20  11  9 

15  HQ  3uie  a.S0  13.00  19  3  9  4  II  18  3  14  19  2  I  -4  lO  17  .3  I  42  3  3  9  19  40  S  17  15  9 

14  HO  Uhxr  40.14  4.3  17  14  12  0  9  8  13  I  19  -7  12  -9  10  20  21  .  11  24  14  II  9  3  3  3  If  II  24 

17  HO  10.40  1.99  4  9  4  3  -1  1  3  12  -1  8  -3  -9  3  1!  I  11  .  1  1  -2  4  9  7  9  -1  1  2 

:3  JK  Ties.  50.47  9,94  24  3  3  9  3  29  3  19  11  -I  -4  -7  10  3  42  24  1  .  58  94  2:  3  44  3  42  3  3 

:?  iS  a<sie  31.91  9.71  18  3  14  -1  10  18  14  17  10  -I  -9-10  4  13  3  14  1  98  .  3  14  3  !2  <9  3  3  3 

20  a  SafttT  3.3  4.43  14  l9  19  -1  10  IS  13  10  4-10  -4-12  9  7  3  11  -2  94  3  .  10  21  41  3  3  3  3 

21  JX  Cmm  1.3  O.tf  8  II  4  1  17  13  4  14  9  5  7  -7  9  10  9  9  a  3  14  10  .  22  29  23  24  14  22 

3  IK  leiatifr  7.22  2.3  3  9  3  -2  8  4  I  8  I  4  -3  1  -I  12  IS  3  9  3  3  3  23  .  20  3  3  20  S 

3  »  reek.  3.12  9.84  24  3  11  4  24  3  3  3  11  -0  -3  -9  2  3  40  3  7  44  92  4!  19  3  .  43  47  3  40 

24  a  lane  3.17  9.3  19  3  14  -4  8  14  12  14  4  •«  -7  -4  9  18  3  3  9  92  49  3  13  3  43  .  5!  40  92 

3  a  SafatT  8.44  2.12  12  19  24  -l  12  14  7  ?  5-10  0  -2  -2  3  17  II  -1  41  3  S  14  3  47  51  .  3  34 

la  a  Caa  2.5!  1.21  3  5  a  -4  4  4  -i  5  8-12  2  -5  10  ii  15  12  i  3  3  3  14  10  3  4<j  2  .  r 


2?  a 


-  n  1  37  9  ’L  mS 

■  •m  *9  ^ 


3  2-15 


«  wa  aa  Q  ^  «a 


!«»  402 


262 


Table  4 


«  MIIMLfi 


:  Ovtrail  IlitxM 

2  £ff/Ltfr  SatiBa 

3  Siseitiiat  itna 

<  tniM 

:  Jal-5atc  Ttea 

s  Jal>S»tc  3tiitr 
?  Coirtat  SiMlrr 

8  CMlat  ^liMs 

9  AMiras  I  Ctrts 
1C  PkM.  StttfiBMI 

11  !tU  auiifie. 

12  Aftieitt  IS 

:2  ^rvMtiai  lata 

14  HO  Ttek. 

15  HO  Basie 
U  HO  Safitr 
::  HO  CsM 

13  jt  TieSt 
I?  JX  Basie 
Zv  Safttr 

21  ;ii  Ca« 

22  JX  leiB'.ifT 
22  S  Tees. 

24  3X  iasie 

22  S  SifftT 
Zi  33  Com 
27  SK  '^tkielc 

23  33  Uattifr 

H*  325 


JOB  PSBrOBMXHCE  KZJkBUBE  fTnaaOiy  STATISTICS 
rOR  IfB:  ARMOR  CRKmiAM 

!«  SB  1  2  3  4  S  t  7  I  f  10  11  12  13  14  15  U  17  II  If  20  2:  22  23  24  25  2A  27  2 


4.42  0.78  .  84  72  31  72  S3  4f  41  12  20  3>37  41  12  15  14  4  25  3  22  19  10  27  2s  18  22  13  ' 

4.38  0.74  84  .  48  SS  74  30  80  45  14  21  8*32  41  17  24  19  17  31  34  31  27  i4  23  32  r  22  22  I! 

4.30  0.83  72  48  .  43  53  41  35  44  -1  ::-14-35  38  4  IS  14  2  21  23  18  17  6  27  18  8  Z  14  -2 

4.74  0.82  36  38  43  .  44  39  43  34  10  43  -0-19  21  8  2  14  -3  1  3  10  3  -4  0  -0  -0  4  :  : 

3.19  3.20  72  74  3  44  .  73  71  33  10  14  17-31  34  3  17  19  13  3  3  24  19  3  27  3  IS  15  3  s 

14.71  1.89  a  SO  41  39  3  .  SO  41  9  13  13-18  19  13  9  13  2  8  7  12  2  4  l£  .2  S  •  «  i 

3.3  1.34  49  80  3  43  71  30  .  43  15  18  8-3  34  IS  3  15  14  3  3  19  20  7  3  3  19  10  13  2 

9.80  1.47  41  45  44  34  J8  41  43  .  -1  7  4-31  3  13  3  13  4  24  18  3  13  8  24  18  17  15  lo  •; 

2.fi  1.40  12  14  -1  10  10  9  IS  -1  .  3  19  -7  3  4  4-3  3  5  7  -0  10  -2  12  12  3  4  3  3 

249.41  3.11  3  21  12  43  14  13  18  7  3  .  -l-IO  10  -3  -3  4  2  -4  0  -4.  4  1  <4  1  2-2  2-7 

2.40  0.48  8  8-14  -0  17  13  8  4  19  -1  .  14  -1  7  7  3  10  11  12  13  17  31  10  4  12  2  Is  -I 

0.33  0.77-37-22-35-19-31-18-32-31  -7-10  14  .-43  -9  -8-14  1-13-17-17  -7  1-19-13-12  -0  -7  -a 
0.03  0.38  41  41  38  28  34  19  34  29  13  10  -1-43  .  10  7  15  12  14  24  28  21  2  17  2  18  4  I5  I 


30.00 

9.99  12  17  4 

8  a 

15  IS  13 

4  -3  7  -9  10  .  18  24  a  34  27  a  13  18  3 

18 

9 

• 

•  • 

0 

38.14 

2.48  IS  24  13 

2  17 

9ar 

4  -3  7  -8  7  18  . 

aaaa3ai8  2: 

3 

•  A 

s 

9 

19 

(f 

21.15 

2.95  14  19  14  14  19 

12  15  13 

•3  4  M4  15  24  21 

f 

14  a  18  18  10  4  13 

13 

« 

¥ 

t 

¥ 

•  • 

*  « 

3 

3135 

7.59  4  17  2 

-3  13 

2  14  4  13  2  10  1  a  »  3 

14 

.  a  a  a  a  1:  a 

a 

IS 

m 

« 

22 

• 

30.00 

9.99  3  31  21 

1  a 

8  20  24 

3  -4  11-13  14  34  a 

eM» 

a  .  40  a  43  34  44  40 

•  • 

a 

•m 

• 

42.14 

7.28  23  34  a 

3  a 

73  18 

7  0  12-17  24  a  a  18  a  40  . 45  a  a  45 

47 

44  41 

43 

4 

21.19 

4.13  22  31  18  10  24  12  19  Z 

-0  -4 13-17  a  a  3 

18 

a  a  45  .  44  34  44 

5; 

24 

5 

11.33 

3.59  19  27  17 

3  19 

2  »  13  10  4  17  -7  21  13  21 

10 

a  43  a  44  . 14  43  51 

3* 

20 

24 

• 

• 

10.05 

1.71  10  !4  4-4  8 

4  7  8 

-2  1  31  i  2  18  18 

4 

11  34  a  34  Is  .24 

3 

IS 

•  > 

• 

34.34 

9.44  n  S  27 

0  a 

15  a  24  a  -4 10-19 17  a  a 

15 

a  4«  45  44  43  24  . 

•  % 

r* 

•  as 

59 

‘^8 

•• 

34.94 

8.44  24  32  11  -0  23  12  25  11  12  1  4-13  S  18  25 

13 

a  40  47  31  31  a  75 

s 

>  m 

47 

•  • 
•  f 

•  m 

3.18 

2.14  11  a  8  -0  18 

1  19  17 

3  2  12-13  13  9  11 

3 

13  44  44  37  34  r  a 

48 

• 

a 

22 

9 

m 

7.39 

1.80  a  a  a 

4  13 

9  10  IS 

4  -2  2  -0  4  2  4 

3 

3  a  41  24  a  18  59 

•  • 

mm 

s 

m  • 
•• 

•  • 

0.54 

0.50  7  11  -2 

2  4 

1  2-1 

8  -7  -1  -4  1  0-0 

4 

3  7  4  3  2  3  21 

** 

a 

>  • 

9 

• 

3.01 

0.94  18  a  14 

2  20 

9  18  14 

8  2  14  -7  15  19  19 

17 

a  42  43  a  24  37  48 

47 

a 

*  • 

• 

9 

263 


Table  5 


•  VWIMLE 


1  Ovtrill  SatxiM 

2  Eff/Ldr  RatiM 

3  fliseifliBt  fftn) 

4  Fitaifi  RatiN 

5  J«i*S*tc  Tiea 
4  Jak-SMC  Otttr 

7  CmIii  cawlrr 

8  CMlat  Prailm 
f  AmiH«  k  Cant 
;0  PiTt.  RfUiiitt 
11  nik  Sttlifie. 
i:  Anteltt  I! 

13  Pr«Mt;ot  Rati 

14  HO  Tick. 

15  HO  8ai:e 
U  HO  SaPttr 
17  HO  Zum 

li  HO  Vittel I 
17  A  Tfca. 

30  JX  latte 
21  il  SafitT 

r  IS  :»m 

2  IX  VNielt 
24  jx  reitttfr 
2  SX  Tiea. 

34  3S  latte 
27  SX  Safitr 
2  U  Vtateii 
37  S  lew.  if » 

H*  22 


JOB  VCBBOBIOHCB  MZJkSUBB  BUMMAIty  8TXTZ8TXCS 

FOR  3 1C:  8ZH0X.B  CBRimXL  RXOZO  OPERATOR 
HN  SD  1  2  3  4  S  4  7  I  f  10  a  12  13  14  18  14  17  II  17  2  21  3  2  24  2  24  r  3  3? 


4.73  0.77  .  83  3  44  74  44  44  44  17  11  2-31  30  3  24  IS  18  -2  24  17  7  14  3  13  17  10  14  2  -2 

4.48  0.72  83  .  48  87  81  71  41  43  II  12  7-31  3  24  21  21  18  2  3  3  14  14  4  13  3  12  3  12  4 

4.44  O.n  3  48  .  82  54  n  33  40  4  4-ll>32  3  10  14  7  10  -1  3  18  4  18  4  7  7  -4  19  -2  *8 

5.05  0.M  44  57  52  .  47  40  42  42  11  3  -4-0  24  12  I  10  4  -2  1  2  0  8  -4  4  -5  -6  -2  -7-12 

14.27  2.01  74  81  3  47  .  74  44  57  14  4  8-14  22  3  24  3  11  -1  27  3  14  15  8  18  17  IS  14  13  •; 

14.37  2.07  44  71  n  40  74  .  3  48  3  -3  -3-17  2  11  18  18  1  0  17  22  8  7  2  3  5  2  7  3  -7 

7.07  1.3  44  41  S3  42  44  54  .  77  11  1  5-21  17  4  13  II  23  -7  3  3  18  17  11  12  18  7  14  10  7 

10.47  1.71  44  43  40  42  57  48  77  .  7  -1  -2-22  14  4  14  11  IS  -0  22  24  3  14  -1  5  IS  5  7  9  -3 

2.14  1.78  17  II  4  11  14  3  11  7  .  23  10  2  12  7  12  4  3  2  10  10  11  -0  -5  8  I  12  4  4  4 

287.54  3.3  11  12  4  3  4  -3  1  -I  23  .  4-U  4  I-IO  0  1  -4  -4  -8  4  1  3  -8  -4  -0  1-13  -5 

2.14  0.77  2  7-11  -4  5  -3  5  -2  10  4  .  4  3  4  5  10  7  8  7  10  8  -4  -4  5  7  4  4  II  19 

0.3  0.l4-3l-3l-32-O-l4-l7-a-22  2-11  4  .-3  -7  -3  -7-12  -3-14  -7-13-20-10  -3-11  -4-12  -4  -2 

•0.02  0.84  3  3  24  24  22  22  !7  14  12  4  3-3  .  8  12  21  7  5  18  17  10  I7  13  12  13  IS  12  4  -9 

78.44  7.47  3  24  10  12  3  11  4  4  7  1  4  -7  8  .33  3  7  42  3  3  21  22  IS  3  21  24  •  : 

a.S  3.3  24  21  14  8  24  II  13  14  12-10  8  -3  12  3  .  II  3  I  3  3  18  18  5  3  3  24  27  10  IS 
3.18  3.77  18  3  7  10  3  18  If  11  4  0  10  -7  3  3  18  .  8  14  10  3  13  7  4  3  11  1C  17  4  7 

14.3  4.37  18  18  10  4  11  1  3  18  3  1  7-12  7  3  3  3  .  1  34  27  3  3  3  3  3  17  i:  S3 

11.3  1.31  -2  2  -I  -2  -1  0-7-0  2-4  $  -3  5  7  8  14  I  .  11  7  10  2  -4  7  3  14  14  12  11 

57.14  11.41  24  3  3  1  3  17  3  3  10  -4  7-14  ll  42  3  10  34  11  .  40  5?  40  37  3  3  w  !v  -  3 

2.12  4.41  17  3  IS  2  3  2  3  24  10  -f  10  -7  17  a  3  a  3  7  40  .»  3  3  31  47  42  43  «  20 

3.31  4.43  7  14  4  0  14  8  IS  3  11  4  8-13  10  3  18  IS  3  10  57  3  .  50  3  20  ta  m  48  2i  24 

10.12  L74  14  14  15  8  15  7  17  14  -0  1  -4-»  I?  21  15  7  3  2  40  3  SO  .  3  i*  *4  a  24  27  1' 

4.54  I.C  3  4  4  -4  B  2  11  -1  -5  3  -4-10  13  2  5  4  3  -4  3332  .  I?  3  14  le  13  11 
4.2  2.13  13  13  7  4  IS  5  12  5  8  -8  5  -3  12  15  3  8  3  7  3  3  3  17  I7  .  27  3  21  ll  W 

V.S7  15.43  17  3  7  -5  17  5  18  15  1  -4  7-11  13  3  3  11  3  3  2  47  aa  44  3  3  .  42  3  43  3 

10.3  2.74  10  2  -4  -8  IS  2  7  5  2-0  4  -4  IS  3  24  10  I7  14  47  42  40  3  14  3  42  .  5e  42  ^ 

ll.M  2.81  14  3  10  -2  14  7  14  7  4  1  4*2  2  24  3  17  11  14  3  43  <8  3  14  3  3  54  .  4;  r 

3.3  1.84  2  12  -2  -7  13  3  10  0  4-13  11  -4  4  7  10  4  5  2  M  40  34  3  13  11  48  42  4!  .  - 

1.14  0.2  -2  4  -8-12  -1  -7  7  -3  4  -5  iO  -3  -0  8  15  7  3  11  3  3  24  17  ll  tO  3  3  3  3  . 


264 


IJ  It  ti 


Table  6 


f  VMIAIi 


1  Oftrtil  SatiiM 

2  Eff/ltfr  >«(;»« 

3  9isexsiiat  Rtaa 

4  Fitmi  iiatiij 
*  JafSMC  Ttca 
3  JiI-Shc  Otftir 
7  C«al«  esMlrr 
3  Ctiaat 

9  Amimi  t  Cam 

10  Aura*  3aaaiaaii 

11  !I1A  Sulifie. 

12  Artxelat  IS 

13  ^raaatxan  Aita 

14  HO  Tati. 

15  HO  Saixa 
11  HO  Sifttr 

17  HO  Vtaxeia 

18  Jl  Tael. 

19  JK  Bu:e 
jX  Safftr 
JX  '/alieia 
3X  Tata. 

3  8ii:: 

3  Safatr 

:S  3  Can 
21  3  Vanuia 

H*  403 


JOB  PSBYOKMBHCB  KEBBTJBZ  BUMMARy  STATISTICS 
TOR  <3B:  X.26BT  1IBXSBT  TBEXCLB  MZCRANXC 

Bl  9  1  2  3  4  S  I  7  I  9  10  11  12  13  14  IS  II  17  18  19  9  21  r  23  24  7  :& 


4.SS  0.84  .  3  7S  57  75  75  18  65  20  7  -4-24  24  IJ  -1  5  10  20  15  Z  II  21  2  2:  IS  :« 

4.31  0.83  U  .  75  SO  84  78  19  II  21  1  -S-23  Z  19  -1  3  12  S  U  27  18  21  S  !9  14  Z 

4.S4  0.9  75  75  .  51  13  15  59  II  15  2  -9-27  21  10  -5  7  an  5  ja  3  14  Z  20  13  I* 

4.82  0.3  S7  SO  SI  .  38  49  4i  41  13  31  2-Z  Z  -2  -2  8  7  -0  2  8  -0  -2  II  14  13  3 

Z.42  4.10  75  84  1338  .78I5S7Z-1  -5-11  II  Z  1  3  13  Z  Z  21  19  Z  21  18  I  Z 
Z.19  3.Z  Z  78  IS  49  78  .  II  55  18  S  -8-18  17  12  4  4  12  18  17  Z  13  Z  18  IS  9  Z 

8.r  1.11  11  19  59  44  1511  .19  14  4  -7-11  17  13  0  9  9  II  11  Z  8  Z  18  14  S  13 

9.92  1.3  15  3  U  41  Z  55  19  .14-0  -l-Z  Z  10  -3  4  9  17  11  Z  7  19  21  IS  II  13 
2.Z  1.11  Z  Z  IS  Z  21  18  14  14  .  4  2-11  7  11  -5  -0  7  7  2  12  11  13  14  10  3  IS 

255.47  31.93  7  1  2  31  -I  5  4  -0  4  .  10-10  15  1  8  3  -I  -7-12  -2  -9-10  1  0  -3 

2.19  O.n  -4  -5  -8  2  -5  -8  -7  -I  2  10  .  1-9-2  5  -4  -0  -I  3  3  2  -2  -2  2  -0  i 
O.Z  0.85-24-23-Z-20-1I-18-II-20-11-10  1  .-3  -3  -2  -2  -4  -7  -5  -I  -0  -l-ll  -7-13  -3 
0.04  0.Z  24  Z  Z  Z  II  17  17  Z  7  15  -4-3  .  -5  -4  -2  -1  13  f  4  8  13  II  IS  9  13 

110.11  1.3  11  19  10  -2  Z  12  13  10  11  1  -2  -3  -S  .  8  I  18  9  Z  19  Z  Z  19  II  4  24 

3.91  4.09-1-1-S-2  I  4  0  -3  -5  8  5  -2  -4  8  .  10  7  1  12  14  12  10  7  15  -i  ;4 

21.92  3.Z  5  3  7  8  3  4  9  4  -0  3  -4  -2  -2  I  10  .  2  2  5  IS  1  1  2  7-7-0 

il.Z  1.3  10  12  9  7  Z  12  9  9  7  -1  -0  -4  -1  18  7  2  .  Z  I  4  11  17  0  a  2  13 

16.11  11.93  Z  Z  II  -0  Z  II  II  17  7  -7  -4  -7  13  Z  I  2  Z  .  12  47  |2  17  SO  Z  3  Z 

24.3  4.3  Z  II  S  2  Z  17  U  11  2-Z  3-5  9  Z  Z  5  I  12  .  45  44  47  41  3  Z  > 

18.91  3.8S  Z  Z  19  8  3  Z  Z  Z  Z  -2  3  -1  4  19  14  18  4  47  Z  .  3  40  3  Z  Z  2? 

1S.31  4.Z  11  II  3  -0  19  13  8  7  11  -9  2  -0  8  Z  Z  1  11  62  3  M  .  3  Z  2:  24  i* 

3.3  Z.3  21  3  14  -2  Z  Z  Z  19  13-10  -2  -I  13  Z  10  I  17  17  47  40  3  .  Z  47  K  j? 

11.3  4.24  Z  Z  Z  II  Z  18  18  Z  14  I  -2-11  ll  19  7  2  I  3  4i  3  Z  Z  .  11  SC  3 

1.32  1.74  Z  19  Z  14  il  If  14  18  10  0  :  -7  15  11  IS  7  I  Z  3  Z  31  47  I;  .  Z  SC 

0.90  9.3  15  14  13  13  I  9  8  18  9  -3  -9-13  8  4  -1  -7  2  3  Z  29  24  Z  50  Z  .  Z 

24.10  5.3  19  Z  14  8  Z  Z  13  18  18  -4  4  -8  12  24  14  -0  13  Z  44  Z  49  19  3  50  Z  . 


265 


rm 


Table  7 


JOB  BEXTORKAHaS  MZXSURB  SUMMARY  STATISTICS 

BOB  «4C:  MOTOR  TRAMSBORT  OBERATOR 
«  gMIARi  «  SO  1  2  3  4  5  A  7  I  9  10  11  12  13  14  IS  U  17  10  If  20  21  r  3 


1  Svtrail  Ritinf 

2  iff/Ldr  RatiM 

3  Rtsf 

4  ritiMi  Ratiaf 

5  ja»*Sm  Ttet 
4  Jal-S#te  Otatr 

7  Cairtat  Eaaalr? 

8  Caatat  Ri^klan 

9  Aitiras  k  Carts 
!0  Rkrs.  Rtaaiaass 

11  RU  Siulifie. 

12  Artielas  IS 

13  PrsMUaa  Rata 

14  HO  Baste 
;!  «  Sar'atr 
14  HO  Vafetetf 
I?  Baste 
la  JK  SafetT 

19  IS 

20  Jk  letatifr 

21  3  Baste 
3  iafatT 
3  Caa 

24  3  Vtiteii 


4.52  0.76  .  64  71  43  72  39  48  S8  13  11 

4.34  0.75  84  .  77  59  78  49  74  58  17  • 

4.53  0.81  78  77  .  52  47  31  38  34  10  3 

4.74  0.87  43  59  52  .  34  39  44  35  3  2B 

29.41  3.74  72  78  47  54  .  78  45  32  13  4 

17.79  2.52  39  49  51  39  78  .  43  41  18  4 

8.80  1.45  48  74  38  44  45  43  .  45  12  4 

9.50  1.43  38  SB  54  35  52  41  45  .  I  >3 

3.12  2.08  13  17  10  3  13  18  12  8  .  4 

248.48  37.70  11  9  3  28  4  4  4  -3  4  . 

2.09  0.75  4  4  -2  3  7  13  11  2  11  3 

O.t'i  0.98-30-25-29-»-2l-I5-2l-24  5  -4 
-0.01  0.57  33  31  35  21  2!  19  22  24  12  -I 

43.44  10.14  8  14  4  -2  9  12  20  12  8  -1 

33.73  9.84  20  20  14  3  14  11  19  IS  4  3 

33.30  4.19  IS  18  14  14  19  14  14  10  5  2 

27.28  5.82  14  2  15  5  17  17  »  14  -3  -* 

33.42  3.42  17  22  15  4  20  14  15  17  -2  -4 

35.40  7.70  19  24  19  4  19  I9  »  r  1-4 

2.15  1.41  3  7  1  2  5  4  5  7  4  2 

14.41  4.34  13  21  14  5  14  13  18  15  3  -2 

4.44  1.93  12  17  12  4  17  17  8  14  4  -5 

0.89  0.32  14  15  IS  7  12  7  10  20  -1  0 

35.72  10.07  14  21  16  -2  15  14  IS  19  2-9 


4*30  33  8  20  15  14  17  19  3  12  12  le  14 

4- 25  31  14  29  18  2  r  24  7  21  IT  IS  21 

-2-29  35  4  14  14  15  IS  19  1  14  12  15  14 

3-20  21  -2  8  14  5  4  4  2  5  4  7  -2 

7-21  25  9  14  19  17  20  19  5  14  17  12  15 

13-15  19  12  11  14  17  14  I9  4  13  17  7  li 

11-21  22  20  19  14  20  IS  »  5  18  3  10  IS 

2- 24  24  12  IS  10  14  17  22  7  IS  14  20  19 

11  5  12  8  4  5  -3  -2  1  6  3  4  -1  2 

3  -4  -1  -1  3  2  -4  -4  -4  2  -2  -5  0  -8 

.  4  -5  9  13  5  7  5  3  -5  -1  -3  1 

4  .-34  -1-11-11  -7-13-12  0  -5  -8-I2 
-5-34  .10  9  10  9  12  11  5  11  9  S  il 

9  -1  10  .  29  10  44  31  30  7  23  21  a  r 

13-11  9  29  .  14  n  31  24  4  24  19  14  Zi 

5- li  10  10  14  .  5  S  15  3  10  i:  :  11 

7  -7  9  44  27  5  .  47  54  10  47  39  2v  »9 

5-13  12  31  31  8  47  .  49  4  42  47  23  *9 

3- l2  li  30  24  15  54  49  .  ij  49  ;0  27  !! 

•0  0  5  7  4  3  13  4  II  .  17  10 

-I  -5  11  28  24  10  47  42  49  17  .  56  43  aS 

-3  -8  9  21  19  11  29  47  40  10  54  .  34  5? 

l-i:  8  4  14  I  20  23  r  -2  43  34  .  ~ 

-1  -4  11  E  24  i:  49  49  E  13  aS  59  37  . 


H«  477 


266 


Table  8 


JOB  VElirOIUlAMCZ  MEABUBl!  SUXIlARy  8TXTZ8TZC8 
FOR  71L:  AOMZ]IZ8TSATr7E  CLERK 

I  VttlMi  I*  SD  1  2  3  4  S  4  7  I  f  10  n  12  13  14  IS  U  17  II  19  :0  21  S  a  14 


1  0»mll  laUiM  4.92  0.13  .  33  71  37  72  43  43  S9  20  24  4*23  20  17  14  3  22  IS  17  21  13  11  5  10 

2  iff/UT  latiaj  4.44  0.78  83  .  73  S4  73  4S  70  40  21  19  2*19  19  23  14  2  29  17  18  23  17  9  7  11 

3  OiseifliM  8409  S.Ol  0.88  71  73  .  47  43  S3  SB  SB  13  13  4-27  19  20  10  -3  22  13  11  20  7  S  1  4 

4  piUMi  Iatii9  3.23  0.89  37  S4  47  .  40  39  33  49  20  35  S-23  20  3  7  -3  1  2  2  -1  0  -S  0-2 

5  Jt4-5m  Ttco  19.81  2.73  72  73  43  40  .  74  S4  SO  8  7  -9-21  21  24  8  -2  21  14  14  28  10  9  4  7 

4  J4l-S4te  OMtr  18.37  3.13  43  4SSS39  74  .3044  10  13  -1-21  17  2  13  1  S  IS  14  24  8  9  10  8 

7  CMlit  EaielrT  8.74  1.83  43  70  SI  33  34  30  .  72  24  19  1-15  II  9  20  11  13  23  17  14  13  8  I  23 

8  CMlat  10.72  1.93  39  40  SB  49  SO  44  72  .  21  14  7-S  13  12  14  4  11  24  12  IS  13  9  i  14 

9  Omcns  I  Ctrti  2.42  1.73  20  21  13  20  8  10  24  21  .  17  20  -4  •  -0  10  -1  -0  3  11  -0  -2  -2  5  1 

10  .•hM.  iMitant  240.40  33.39  24  19  13  33  7  13  19  14  17  .  11  -9  5  1  4  3  0  -3  8  3  4  12  2  S 

11  H14  3«alifx:.  1.84  0.80  4  2  4  3  -3  -1  3  7  20  11  .  3  2  -4  12  8  -4  7  3  -3  2  -7  -1  12 

12  AnielM  IS  0.22  0.42-23-19-27-23-21-21-13-22  -4  -9  3  .-42-13  -3  1-10  -7  2-10  -5  -J  -5  * 

IS  8«tt  0.01  0.44  20  19  19  20  21  17  18  13  9  3  2-42  .  12  3  2  4  4  9  3  7  a  4  -0 

14  HQ  Ttc4.  84.09  14.24  17  23  20  3  24  22  9  12  -0  I  -4-13  12  .  28  13  SB  34  33  38  23  23  7  II 

13  «  8«:e  18.34  3.00  14  14  10  7  8  13  20  14  10  4  12  -3  3  28  .  43  29  48  33  23  24  17  4  23 

14  HO  Ufttr  20.34  4.00  3  2  -3  -3  -2  1  11  4  -1  3  8  1  2  13  43  .  11  28  23  ?  13  10  9  17 

17  A  Tiea.  42.21  9.33  22  29  22  1  28  22  13  11  -0  v  -4-10  4  38  29  11  .  47  48  73  42  24  17  i7 

18  JK  8atie  23.2  3.14  IS  17  IS  2  14  IS  2  24  3  -3  7  -7  4  34  48  28  47  .  30  40  44  27  27  28 

19  JK  Sifftr  14.24  3.01  17  18  11  2  14  14  17  12  11  8  3  2  9  33  33  23  48  30  .  43  3  32  19  23 

20  S  Ttcl.  44.99  9.78  21  3  20  -1  3  24  14  13  -0  3  -3-10  333  7734043  .4433  13  14 

21  S  3uxe  9.90  2.3  13  17  7  9  10  I  13  13  -2  4  2  -5  7  3  3  13  42  44  3  44  .  32  13  3! 

22  a  SaftiT  4.3  1.3  11  9  8  -3  9  9  8  9  -2  12  -7  -3  4  3  17  10  24  27  r  3  32  .  4  IS 

3  a  3m  0.3  0.48  3  7  1  9  4  10  8  1  3  2  -1  -3  4  7  4  9  17  27  19  1!  18  4  .  i: 

24  a  Vtixeli  2.71  1.21  10  U  4  -2  7  9  S  14  i  8  13  4  -9  11  3  17  17  3  3  le  31  IS  11  . 


Table  9 


4  MIMKi 


1  Qvtrill  Satina 
Z  IttIHt  Satina 
3  Siteiniint  Stnj 
*  .'itatit  Satina 
5  Jat^Sote  Tocli 
0  iii*Snae  Otntr 

7  Caaiat  Enain 

8  Canat  ^rtllm 

9  Allans  S  Ctrti 
iC  Skrt.  Bta4itt«i 
::  na  Soaiifie. 
ir  Articles  15 

:2  ’maotion  Sate 
14  riO  Tteki 
:S  MO  Basie 
14  HG  SafttT 
:7  JS  Ttea4 
Is  JK  Satie 
iS  SafttT 

:o  JX  vtnieit 
::  JX  Utntifr 
S  SX  Tfch. 
a  38  Sasie 
24  SX  SafttT 
27  '38  vtnieit 

a*  Sa 


JOB  fBBFOBMAKCB  MEASUBS  BUMMABY  8TATZSTZCS 

FOB  91At  KZDZCAL  8FZCZALZST 

IM  SD  1  2  3  4  8  a  7  I  9  to  11  12  13  14  IS  14  17  !8  If  20  ::  a  a  :•  a 


4.41  0.82  .  84  78  40  47  42  71  70  a  17  *2-2f  a  17  4  13  3  24  a  4 

4.40  0.77  84  .  76  54  73  47  3  71  24  13  -4-30  33  3  f  19  3  3  21  -2 

4.54  0.91  3  74  .  47  40  47  3  49  a  7  -8-3  31  IS  11  13  3  21  3  -4 

4.74  0.92  40  54  47  .  41  3  49  47  10  3  0-3  18  3  -0  -0  3  7  4  7 

3.09  3.24  47  3  40  41  .  47  3  54  IS  4  -1-3  3  18  2  13  S  14  14  -3 

18.47  2.3  42  47  47  3  47  .  44  SI  3  7  9-17  3  10  4  14  18  3  20  S 

9.3  1.48  71  3  3  49  3  44  .  79  3  9  9-3  3  14  10  IS  3  3  3  1 

10.11  1.77  3  71  49  47  54  SI  79  .3  S  -5-3  3  14  4  11  24  3  3  -1 
3.04  2.01  3  24  a  10  IS  3  3  3  .  14  34  -8  13  3  7  3  4  10  3  11 

23.71  31.94  IS  13  7  3  4  7  9  5  14  .  17-11  -2  4  -*  -5  -3  -7  -2  3 

2.M  0.3  -2  -4  -8  0  -1  9  9  -5  34  17  .  -I  -4  3  0  8  -8  5  -7  -0 

0.41  0.89-29-30-29-20-27-17-20-3  -S-ll  -I  .-3-10  I  ->10  -7-4  3 

-0.00  3.3  3  3  31  18  3  a  3  »  13  -2  -4-3  .  10  9  7  14  3  9  -9 

3.48  10.02  17  3  15  3  18  10  14  14  3  4  3-10  10  .  14  3  3  3  3  2 

9.3  3.M  4  9  11  -0  2  4  10  4  7  -4  0  1  9  14  .  17  3  37  3  9 

.3.3  4.3  13  19  13  -0  13  14  15  11  3  -S  3  -7  7  34  17  .  3  32  3  3 

3.S  13.71  3  3  3  3  a  18  a  24  4  -3  -8-10  14  39  3  3  .  54  78  12 

15.19  3.3  24  3  21  7  14  3  3  3  10  -7  5  -7  »  3  3  3  54  .3  3 

42.71  7.3  3  21  3  4  14  3  a  3  8  -2  -7  -4  9  3  3  3  3  3  .3 

2.42  1.04  4  -2  -4  7  -3  3  1  -I  11  3  -0  a  -9  2  9  3  13  8  12  . 

4.a  232  8  13  4  1  3  IS  18  9  18  -3  12  -S  11  13  14  17  14  24  14  10 

91.45  17.3  333-13333  4-8  -4-l3  18  44  17  3  47  41  3  2 
2.04  0.3  a  14  14  2  5  11  3  3  11  -3  -2-14  11  8  18  10  3  3  21  -3 

5.77  1.54  15  14  11  -4  15  18  17  14  a  -8  2  -4  14  3  3  3  48  3  49  6 

4.51  i.qZ  4  9  10-15  7  14  a  0  -?  2  -I  i:  14 11 18  a  a  a  a 


3  a  12  15  3 
a  3  14  14  ? 

4  3  14  11  19 
1  -I  2  -4-15 
3  3  5  15  ■ 
IS  a  11  18  14 
18  3  :c  17  i: 

9  3  3  14  i: 
IS  4 II  a  0 

•5  -6  -2  -8  -7 


i:  18 
13  44 


14 

24 

la 

10 


'14  “4  “1 
11  1-  11 
8  a  14 


18  a 


‘ft  ^  rt 
ijv  *v  mm 

•IS  *•  •• 

•  t  M 

•  •  **  mm 

mm  ^  m^ 


99 


-3 


49  2: 
4  0 


15  .  24  52  34 


3 


•  *  A.  .  •  •• 
im  m9  •*  m* 


26C 


Table  10 


JOB  PERrORMANCB  MZASUBE  SUMMARY  STATISTICS 
TOR  9 SB:  MILITARY  POLZCB 

i  VMIMLE  m  SD  i:34S  6  7  8  Y10lll2l3  14  19i*l7lllf  202l2Z3:(3:ira:?:o 


1  Ovtrali  Satina  4.74  0.80  .  T  49  70  78  M  74  70  18  2  t>a  21  15  18  8  4  1  12  10  8  4  9  8  19  3  7  n  8  -e 

2  eff/ltfr  Satina  4.50  0.73  87  .  71  41  77  72  78  68  20  17  11-2  19  14  21  10  10  I  10  13  7  7  15  iO  18  13  12  o  «  -5 

3  OisciPiinf  Stna  4.71  0.77  49  71  .  44  45  48  55  43  4  7  4-2  24  4  9  5  7  -3  11  10  4  3  12  12  i!  IS  14  4  13  -2 

4  ritnfss  Satina  4.90  0.84  70  41  44  .  56  94  55  2  to  43  13-24  14  9  12  7  5  2  0  3  -2  -0  3  -3  3  5  2  •!  I 

5  Int-SNC  Tten  29.00  3.44  78  77  45  58  .  73  48  43  IS  15  11-19  17  12  19  8  12  2  14  14  9  9  9  7  17  11  12  5  5  -2 

4  Jt4-5pte  OUtr  2.40  3.10  48  2  48  94  2  .  71  41  2  18  2-14  8  11  2  14  19  7  2  13  4  4  u  7  9  12  4  5  4  -2 

7  Caalat  SaMlrr  9.54  1.34  74  78  55  SS  48  71  .  79  19  19  14-17  14  17  19  14  14  4  10  17  11  7  IS  4  19  15  9  9  3  : 

8  Castat  9i>«I1mi  10.45  1.2  70  48  43  2  43  41  79  .  IS  IS  19*2  21  14  14  11  10  -0  14  18  17  11  14  10  19  15  «  4  13  0 

9  Aaaras  4  Carti  3.17  2.09  18  2  4  14  IS  2  19  IS  .  2  2  -3  11  8  14  4  3  11-11  2  -1  7  -0  9  4  4  lO  4  «  -0 

10  »•!?$.  SfBdiatti  21.75  2.78  2  17  7  43  15  18  19  15  2  .  13-12  7  -1  4  4  2  4  -4  1  -3  -3  -2-12  -2  -8  -4  -3  2  -5 

11  1116  Suaiifie.  2.2  0.74  13  11  4  13  11  2  14  IS  2  13  .  1  -3  4  4  5  4  4  -3  7  3  2  -9  -1  •  -)  -2  -2  -I  4 

12  ArtielM  15  0.2  0.70-28-2-2-24-19-14-17-2  -3-12  I  .-39-4-0-8  -3  2  -8  -8  -5  -2  0  0  -7  -4  -3  5  •?  4 

13  Srmtioa  Sati  0.01  0.47  2  19  24  14  17  8  14  21  11  7  -3-39  .  4  4  4  I  -3  15  IS  14  10  4  2  0  10  7  1  2  -1 

14  HO  Tten.  31.2  4.43  IS  14  4  9  12  11  17  14  8  -t  4  -4  4  .  18  12  4  11  13  11  10  7  2  5  14  *  3  IS  4-7 

15  HO  3at:e  50.04  10.2  18  21  9  12  19  2  19  14  14  4  4  -0  4  18  .  2  2  18  18  34  24  2  17  5  12  2  2  :2  15 

14  HO  Safftr  31.74  9.14  8  10  5  7  8  14  14  11  4  4  5  -8  4  12  2  .  9  IS  10  2  2  2  12  9  19  17  IS  9  10  -4 

17  HO  Com  10.2  2.17  4  10  7  5  2  19  14  10  8  2  4  -3  1  6  2  9  .  31  14  2  13  2  U  7  9  2  14  I;  ll-lO 

15  HO  vtnicit  10.54  1.43  I  I  -3  2  2  7  6  -0  11  4  4  2  -3  II  18  15  31  .  I  4  8  19  14  2  11  12  9  i:  13  -1 

19  JX  Tten.  2.44  5.90  12  10  11  0  14  2  10  14-11  -4  -3  -8  15  13  IS  10  14  1  .  4C  53  2  19  15  40  2  2  24  1?  1 

20  JX  Basie  90.11  9.99  10  13  10  3  14  13  17  18  2  I  7  -8  I5  11  34  2  2  4  40  .  40  51  32  2  2  49  44  2  31  - 

:i  jX  Saftir  2.52  4.2  8  7  6  -2  9  4  11  17  -I  -3  3  -5  16  10  2  2  13  8  2  40  .  40  24  20  2  37  2  2  2  0 

2  JX  Cnoa  13.54  4.42  4  7  8  -0  5  4  7  II  7  -3  2  -2  10  7  2  2  2  19  2  51  40  .  2  18  2  2  31  2  24  -3 

a  JX  ‘Jenieif  2.03  1.19  9  19  2  3  9  14  15  14  -0  -2  -9  0  4  3  17  2  14  14  18  2  24  2  .  15  18  2  2  2  17  -4 

24  JX  lOentifr  4.2  2.29  8  10  2  -3  7  7  4  10  9-2  -1  0  2  5  5  9  7  2  IS  2  2  18  19  .  21  2  21  17  12  -2 

3  3  7fcn.  40.2  7.04  19  18  15  3  17  9  19  19  4  -2  1  -7  0  14  2  15  9  11  40  2  34  2  IH  21  .  4a  49  2  r  2 

2a  3X  Basie  17.2  3.2  8  13  18  5  ll  12  15  15  4  -8  -0  -4  10  7  2  17  21  12  2  49  37  S  2  20  4?  .  bO  40  44  -• 

r  sX  Safftr  14.45  3.2  7  12  14  2  2  4  9  9  10  -4  -2  -3  7  3  2  la  14  9  3  44  3  31  2  21  49  40  .  2  40  -a 

2  3  Caoa  3.12  1.2  4  8  4  -1  5  5  9  4  4  -3  -2  5  I  10  2  9  17  13  24  2  2  34  2  17  3  40  29  .2  -0 

3  3  iJtnielt  4.02  1.90  8  9  10  1  5  4  8  13  9  2  -1  -7  2  4  15  10  11  10  19  31  3  24  *7  12  2  44  40  2  .  -i 

30  3  UaatifT  0.3  0.51  -4  -5  -3  -7  -2  -2  I  0  -0  -5  4  4  -I  -0  -7  -4-10  -I  I  4  0  -3  -4  -2  2  -1  -4  •;  . 

504 


269 


Table  11 


FACTOK  LOADZHGfl 
SEFAXATS  MODEL  FOE  EACB  JOB 


Military  Occupational  Specialty 


Construct/Factor 

llB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

Cora  Technical 

HO  Tech 

— 

.61 

.47 

.64 

.51 

.29 

.77 

.59 

.32 

JK  Tech 

-- 

.75 

.78 

.79 

.74 

.26 

.78 

.75 

.32 

SK  Tech 

— 

.70 

.79 

.73 

.82 

.55 

.229 

.81 

.43 

MOS  Tech  Rtng 

.45 

.10 

.22 

.25 

.25 

.34 

.10 

.13 

General  Soldiering 

HO  Soldier 

.60 

.51 

.46 

.64 

.17 

.50 

.60 

.42 

.60 

HO  Safety 

.26 

.33 

.32 

.31 

.12 

.63 

.37 

.48 

.47 

HO  COIBB 

.05 

.06 

.39 

.56 

— 

— 

— 

.80 

HO  Vehicle 

— 

— 

»  — 

.22 

.17 

** 

— 

.31 

JR  Soldier 

.76 

.52 

.74 

.62 

.45 

.48 

.87 

.58 

.46 

JX  Safety 

.55 

.37 

.75 

.38 

.71 

.51 

.72 

.58 

.33 

JX  Cobud 

.30 

.23 

.66 

.38 

vee 

— 

— 

— 

.29 

JX  Vehicle 

-• 

.17 

— 

.10 

.41 

** 

— 

.35 

JX  Identify 

.46 

— 

.20 

.28 

— 

.12 

— 

.24 

.21 

SX  Soldier 

.73 

.45 

.67 

.39 

.78 

.56 

.45 

.44 

.42 

SX  safety 

.47 

.32 

.53 

.62 

.57 

.47 

.30 

.64 

.32 

SX  COBUB 

.42 

.26 

.42 

— 

.41 

.35 

.20 

— 

.20 

SX  Vehicle 

.22 

.24 

.05 

.30 

.61 

** 

.22 

.47 

.28 

SX  Identify 

.46 

.46 

.13 

•• 

•• 

Effort/Leadership 

Eff/Ldr  Rating 

.76 

.56 

.85 

.64 

.68 

.83 

.66 

.76 

.70 

MOS  Tech  Rtngs 

.70 

— 

.63 

.40 

.41 

.50 

.25 

.59 

.52 

MOS  Other  Rtng 

.77 

.41 

.48 

.43 

.54 

.62 

.43 

.61 

.56 

CoBib  Exnplry 

.80 

.47 

.68 

.54 

.57 

.87 

.63 

.80 

.77 

CoBib  Probleae 

.48 

.20 

— 

.39 

.52 

.53 

.55 

— 

.56 

Awards/Cezrt 

.32 

.23 

.24 

.19 

.28 

.25 

.34 

.34 

.22 

overall  Rating 

.46 

.39 

.33 

.17 

.57 

.42 

.65 

.41 

Table  n 


FACTOX  LOADZXGS 
•BPAXATX  MODEL  FOX  XXCX  JOB 
(eoBtinuad) 


Military  Occupational  Spacialty 


Construct/ Factor 

IIB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

Oisciplina 

Discpln  Rtng 

.77 

.58 

.73 

.45 

.63 

.85 

.74 

.58 

.73 

Coab  Problaas 

.29 

.16 

.62 

.03 

.05 

.19 

— 

.82 

.33 

Articlas  IS 

-.63 

-.61 

-.55 

-.62  - 

.65 

-.47 

-.69  -.46 

-.60 

Promotion  Rata 

.74 

.61 

.68 

.79 

.63 

.57 

.59 

.54 

.54 

Ovarall  Rating 

.39 

.20 

.53 

.54 

.09 

.42 

.06 

.75 

.38 

Fitnass/Baaring 

Fitnass  Ratngs 

.69 

.23 

.84 

.48 

.54 

.42 

.50 

.60 

.78 

Phys  Raadinass 

.11 

.90 

.49 

.89 

.70 

.53 

.76 

.69 

.69 

Ratings  Mathod.. 

AW  Ratings 

.60 

.73 

.47 

.70 

.66 

.54 

.65 

.66 

.66 

Mos  Ratings 

.73 

.73 

.60 

.69 

.67 

.49 

.69 

.54 

.63 

Comb  Ratings 

.47 

.65 

.55 

.69 

.57 

.27 

.55 

.47 

.40 

Writtan  Mathod 

JX  Tach 

— - 

.47 

.28 

.55 

.59 

.73 

.44 

.58 

.57 

JR  Soldiar 

.41 

.51 

.33 

.40 

.61 

.57 

.11 

.37 

.59 

JK  Safaty 

.37 

.52 

.12 

.63 

.08 

.49 

.17 

.76 

.57 

JX  Comm 

.34 

.11 

.07 

.55 

MM 

— 

— 

— 

.52 

JX  Vahicla 

— 

— 

— 

.42 

.62 

** 

— 

.24 

.21 

JX  Idantify 

-.15 

.23 

.50 

.36 

— 

.05 

~ 

.08 

.23 

SX  Tach 

Mas 

.48 

.48 

.55 

.46 

.88 

.42 

.27 

.50 

SX  Soldiar 

.50 

.66 

.54 

.59 

.15 

.51 

.54 

— 

.54 

SX  Safaty 

.53 

.55 

.42 

.29 

.34 

.48 

.44 

.19 

.60 

SX  Comm 

.51 

.47 

.46 

— 

.16 

.24 

.05 

— 

.42 

SX  Vahicla 

.49 

.57 

.24 

.48 

.55 

** 

.38 

.05 

.42 

SX  Idantify 

.21 

— 

.42 

.44 

— 

— 

— 

MM 

-- 

Ml 6  Qualification 

.71 

.71 

.71 

.71 

.71 

.71 

.71 

.71 

.71 

**  Vahicla  contant 

was 

margad  into  tha  Cora 

Tachnical 

factor 

for 

64C. 

271 


Table  12 


UHZQUEHESS  B8TZ1IXTZ8 
8SPARBTB  MODEL  FOR  EACH  JOB 


Factor  Scora 

Military  Occupational  Specialty 

IIB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

HO  Tech 

.52 

.71 

.48 

.64 

.74 

.33 

.57 

.88 

HO  Soldier 

.59 

.66 

.75 

.52 

.95 

.74 

.55 

.76 

.63 

HO  Safety 

.92 

.85 

.75 

.52 

.95 

.59 

.79 

.71 

.77 

HO  Coma 

.95 

.95 

.81 

.62 

— 

— 

— 

-- 

.82 

HO  Vehicle 

— 

— 

.03 

.95 

** 

— 

— 

.90 

JK  Tech 

.21 

.30 

.15 

.12 

.39 

.17 

.11 

.53 

JK  Soldier 

.10 

.43 

.22 

.26 

.29 

.74 

.31 

.58 

.43 

JK  Safety 

.32 

.53 

.32 

.31 

.45 

.49 

.44 

.15 

.57 

JK  Coma 

.56 

.93 

.32 

.34 

-- 

— 

— 

.64 

JK  Vehicle 

— 

-- 

~ 

.56 

.32 

** 

.94 

.82 

JK  Identify 

.36 

.89 

.40 

.51 

— 

.95 

— 

.92 

.90 

SK  Tech 

.27 

.13 

.09 

.10 

.14 

.14 

.15 

.52 

SK  Soldier 

.09 

.37 

.14 

.48 

.31 

.42 

.54 

.74 

.46 

SK  Safety 

.46 

.59 

.43 

.41 

.50 

.55 

.72 

.47 

.55 

SK  Comn 

.40 

.72 

.35 

... 

.65 

.82 

.78 

.67 

SK  Vehicle 

.73 

.62 

.69 

.55 

.18 

•* 

.73 

.76 

.75 

SK  Identify 

.45 

.10 

.22 

.25 

.25 

.34 

.10 

.13 

Overall  Rating 

.13 

.13 

.13 

.13 

.13 

.13 

.13 

.13 

.18 

Eff/Ldr  Rating 

.11 

.11 

.11 

.11 

.11 

.05 

.11 

.11 

.05 

Discpln  Rating 

.22 

.22 

.22 

.22 

.22 

.05 

.22 

.22 

.06 

Fitness  Rating 

.38 

.38 

.38 

.38 

.38 

.05 

.38 

.38 

.05 

MOS  Tech  Rtngs 

.08 

.11 

.13 

.14 

.08 

.37 

.17 

.12 

.33 

KOS  Other  Rtng 

.10 

.13 

.17 

.19 

.12 

.35 

.20 

.18 

.27 

Comb  Exaplry 

.02 

.02 

.02 

.02 

.14 

.02 

.02 

.08 

Comb  Problems 

.13 

.13 

.13 

.13 

.13 

.60 

.13 

.13 

.40 

Avards/Cert 

.89 

.94 

.93 

.95 

.91 

.94 

.86 

.85 

.90 

Phys  Readiness 

.95 

.33 

.67 

.34 

.50 

.83 

.46 

.49 

.49 

Articles  15 

.58 

.59 

.68 

.60 

.56 

.76 

.51 

.75 

.64 

Promotion  Rate 

.45 

.60 

.53 

.41 

.57 

.64 

.62 

.67 

.70 

M16 

.50 

.50 

.50 

.50 

.50 

.50 

.50 

.50 

.50 

.  **  Vtthicl*  contsnt  was  aargsd  into  ths  Tschnlcal  factor  for  64C. 


272 


Table  13 


SSTXXIkTSO  COMSntOCT  CORUXATZOMS 
•SPASATf  MODEL  POE  EACH  JOB 


Hilitaxy  Occupational  Specialty 


1st  Construct  2nd  Construct 

IIB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

Cora 

Gen  Soldiering 

«ee 

.77 

.83 

.63 

.58 

.73 

.48 

.66 

.70 

Technical 

Effort/Lead 

.67 

.86 

.51 

.44 

.50 

.78 

.44 

.35 

.46 

Discipline 

.42 

.13 

.37 

.26 

.12 

.69 

.19 

.43 

.50 

Fitness 

.25 

.01 

.03 

.04 

-.18 

-.09 

.10  • 

-.05  - 

-.09 

M16 

.27 

.00 

.04 

.11 

.05 

.05  < 

-.09 

-.17  ■ 

-.10 

General 

Effort/Lead 

.89 

.58 

.57 

.53 

.44 

.37 

.43 

.40 

Soldiering 

Discipline 

.29 

.45 

.30 

.29 

.29 

.04 

.37 

.24 

Fitness 

•• 

-.19 

.05 

-.05 

-.03 

-.14 

.09 

-.05 

.00 

M16 

-.06 

.30 

.30 

.04 

.11 

.27 

.02 

.02 

Effort/ 

Discipline 

.49 

.67 

.62 

.55 

.65 

.51 

.51 

.59 

.39 

Leadership 

Fitness 

.57 

.04 

.38 

-.11 

.10 

.23 

.32 

.21 

.42 

K16 

.38 

-.13 

.21 

.24 

-.02 

.35 

.22 

.17 

.28 

Discipline 

Fitness 

.33 

.05 

.24 

.24 

.30 

.30 

.27 

.19 

.25 

K16 

-.12 

-.25 

-.30 

.09 

-.28 

-.11 

.01 

-.28 

-.08 

Fitness 

M16 

.52 

.26 

-.05 

.02 

.19 

.22 

.18 

.27 

.26 

273 


Table  14 


OOOOMBSa-OF-FXT  ZFDZCES 
■IFASATS  MODIL  FOR  SXCB  JOB 


MOS 

Root  Maan 
Squara  Rasidual 

Chi 'Squara 

df 

P 

IIB: 

Infantryman 

.061 

326.2 

227 

.02 

13B: 

Cannon  Cra%rman 

.057 

350.0 

322 

.14 

19E: 

Tamk  Crawman 

.065 

170.0 

348 

.999 

31C: 

Radlo/Talatypa 

Oparator 

.069 

369.2 

375 

.58 

63B: 

Vahid  a/Ganarator 
Maehanic 

.060 

332.1 

296 

.07 

64C: 

Motor  Transport 
Oparator 

.058 

280.1 

247 

.07 

71L: 

Adminlstratlva 

Clark 

.067 

232.6 

249 

.77 

91A: 

Madical  Spaciallst 

.061 

277.1 

275 

.45 

95B: 

Military  Polica 

.052 

470.0 

374 

.001 

274 


Table  15 


FUCTOR  LOAOZiraS 
■niGLE  MODEL  LCBOtS  ALL  JOBS 


Military  Occupational  Specialty 


Construct/Factor 

IIB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

Cora  Technical 

HO  Tech 

— 

.59 

.43 

.58 

.46 

.27 

.71 

.54 

.29 

JK  Tech 

— 

.71 

.79 

.76 

.57 

.72 

.70 

.74 

.37 

SK  Tech 

— 

.66 

.70 

.54 

.73 

.55 

.68 

.85 

.42 

MOS  Tech  Rtng 

** 

.21 

.12 

.16 

.25 

.01 

.12 

.05 

-.02 

General  Soldiering 

HO  Soldier 

.52 

.66 

.44 

.52 

.16 

.51 

.57 

.35 

.58 

HO  Safety 

.20 

.44 

.31 

.36 

.10 

.49 

.30 

.50 

.41 

HO  COIBB 

.06 

.12 

.37 

.52 

••MB 

.43 

HO  Vehicle 

•• 

.15 

.21 

** 

•• 

.27 

JK  Soldier 

.95 

.50 

.79 

.64 

.42 

.69 

.66 

.69 

.49 

JK  Safety 

.69 

.36 

.75 

.45 

.53 

.66 

.57 

.65 

.42 

JK  Coma 

.35 

.25 

.59 

.51 

— 

•• 

— 

.39 

JK  Vehicle 

— 

.28 

.37 

** 

.. 

.07 

.34 

JK  Identify 

.43 

.21 

.34 

.36 

— 

.12 

— 

.39 

.18 

SK  Soldier 

.81 

.40 

.67 

.33 

.70 

.50 

.42 

.40 

.38 

SK  Safety 

.57 

.34 

.45 

.40 

.63 

.43 

.31 

.62 

.34 

SK  Cobb 

.51 

.21 

.31 

•- 

.42 

.29 

.17 

.23 

SK  Vehicle 

.35 

.22 

.06 

.17 

.65 

.32 

.36 

.21 

Effort/Leaderahip 

Eff/Ldr  Rating* 

.76 

.76 

.76 

.76 

.76 

.76 

.76 

.76 

.76 

MOS  Tech  Rtnge* 

.59 

.33 

.54 

.50 

.45 

.62 

.43 

.62 

.62 

MOS  Other  Rtng* 

.77 

.59 

.33 

.45 

.59 

.48 

.47 

.58 

.58 

CoBb  ExBplry* 

.72 

.72 

.72 

.72 

.72 

.72 

.72 

.72 

.72 

CoBb  PrebleB* 

.44 

.44 

.44 

.44 

.44 

.44 

.44 

.44 

.44 

Avards/Cert* 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

overall  Rating* 

.48 

.48 

.48 

.48 

.48 

.48 

.48 

.48 

.48 

275 


Table  15 


FACTOR  L0ADZV68 
aZMOLI  llOORL  ACROSS  ALL  JOBS 
(OOBtiBU«d) 


Military  Occupational  Specialty 


Construct/Factor 

llB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

Discipline 

Oiscpln  Rtng* ** 

.69 

.69 

.69 

.69 

.69 

.69 

.69 

.69 

.69 

CoBb  Problems* 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

.25 

Articles  15* 

-.48 

-.48 

-.48 

-.48 

-.48 

-.48 

-.48 

-.48 

-.48 

ProBOtion  Rate* 

.52 

.52 

.52 

.52 

.52 

.52 

.52 

.52 

.52 

Overall  Rating* 

.28 

.28 

.28 

.28 

.28 

.28 

.28 

.28 

.28 

F itness/Bear ing 

Fitness  Ratngs* 

.82 

.82 

.82 

.82 

.82 

.82 

.82 

.82 

.82 

Phys  Readiness* 

.37 

.37 

.37 

.37 

.37 

.37 

.37 

.37 

.37 

Ratings  Method 

AW  Ratings* 

.56 

.56 

.56 

.56 

.56 

.56 

.56 

.56 

.56 

MOS  Ratings* 

.61 

.61 

.61 

.61 

.61 

.61 

.61 

.61 

.61 

CoBb  Ratings* 

.42 

.42 

.42 

.42 

.42 

.42 

.42 

.42 

.42 

Written  Method 

JR  Tech 

— 

.49 

.29 

.54 

.71 

.30 

.42 

.49 

.49 

JR  Soldier 

-.16 

.51 

.29 

.40 

.53 

.25 

.28 

.60 

.60 

JR  Safety 

-.07 

.49 

.07 

.52 

.26 

.28 

.35 

.52 

.52 

JR  Cobb 

.00 

.11 

.19 

.38 

— 

— 

.41 

.41 

JR  Vehicle 

— 

— 

— 

.19 

.62 

** 

.20 

.20 

JR  Identify 

-.05 

.20 

.12 

.17 

— 

.10 

— 

.25 

.25 

SR  Tech 

— 

.54 

.65 

.64 

.49 

.71 

.45 

.53 

.53 

SR  Soldier 

.44 

.68 

.58 

.61 

.25 

.66 

.50 

.60 

.60 

SR  Safety 

.34 

.51 

.49 

.57 

.18 

.56 

.30 

.59 

.59 

SR  Cobb 

.51 

.46 

.60 

~ 

.20 

.36 

.20 

.50 

.50 

SR  Vehicle 

.38 

.51 

.17 

.60 

.45 

** 

.17 

.46 

.46 

*  These  loadings  were  constrained  to  be  equal  across  all  MOS. 

**  Vehicle  content  was  Barged  into  the  Core  Technical  factor  for  64C. 


276 


mrzQiimst  itTuana 

■zxaz.1  MODEL  LCBOflE  ELL  JOBE 


Military  Occupational  Specialty 


Factor  Score 

IIB 

13B 

19E 

3 1C 

63B 

64C 

71L 

91A 

95B 

HO  Tech 

.62 

.79 

.62 

.76 

.91 

.44 

.68 

.90 

HO  Soldier 

.72 

.58 

.80 

.70 

.95 

.73 

.64 

.87 

.67 

HO  Safety 

.95 

.84 

.90 

.87 

.95 

.73 

.90 

.75 

.81 

HO  Coam 

.95 

.95 

.86 

.71 

~ 

“ 

.82 

HO  Vehicle 

— 

— 

— 

.95 

.95 

** 

— 

.93 

JH  Tech 

.23 

.28 

.13 

.15 

.32 

.28 

.16 

.60 

JX  Soldier 

.10 

.44 

.28 

.40 

.48 

.41 

.44 

.47 

.40 

JX  Safety 

.48 

.56 

.41 

.49 

.62 

.44 

.55 

.26 

.54 

JX  Coam 

.85 

.91 

.57 

.55 

— 

— 

— 

.67 

JX  Vehicle 

— • 

— 

— 

.87 

.44 

** 

-- 

.95 

.85 

JX  Identify 

.71 

.90 

.84 

.81 

.95 

.64 

.90 

SX  Tech 

•• 

.25 

.10 

.24 

.18 

.17 

.27 

.19 

.54 

SX  Soldier 

.13 

.37 

.20 

.52 

.41 

.31 

.58 

.83 

.49 

SX  Safety 

.54 

.62 

.54 

.51 

.55 

.51 

.80 

.29 

.54 

SX  Coam 

.46 

.75 

.48 

— 

.77 

.78 

.92 

“ 

.70 

SX  Vehicle 

.75 

.68 

.95 

.61 

.31 

** 

.86 

.86 

.75 

Overall  Rating* ** 

.18 

.18 

.18 

.18 

.18 

.18 

.18 

.18 

.18 

Eff/Ldr  Rating* 

.09 

.09 

.09 

.09 

.09 

.09 

.09 

.09 

.09 

Discpln  Rating* 

.17 

.17 

.17 

.17 

.17 

.17 

.17 

.17 

.17 

Fitness  Rating* 

.05 

.05 

.05 

.05 

.05 

.05 

.05 

.05 

.05 

MOS  Tech  Rtngs* 

.18 

.34 

.22 

.24 

.18 

.18 

.18 

.18 

.25 

MOS  Other  Rtng* 

.05 

.24 

.46 

.37 

.05 

.05 

.05 

.05 

.27 

CoaUs  Exaplry* 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

.26 

Coadd  Probleas* 

.29 

.29 

.29 

.29 

.29 

.29 

.29 

.29 

.29 

Awards/Cert* 

.93 

.93 

.93 

.93 

.93 

.93 

.93 

.93 

.93 

Phys  Readiness* 

.83 

.83 

.83 

.83 

.83 

.83 

.83 

.83 

.83 

Articles  15* 

.77 

.77 

.77 

.77 

.77 

.77 

.77 

.77 

.77 

ProBotion  Rate* 

.70 

.70 

.70 

.70 

.70 

.70 

.70 

.70 

.70 

*  Thasa  loadings  vara  constrained  to  be  equal  across  all  MOS. 

**  Vehicle  content  was  aerged  into  the  Core  Technical  factor  for  64C 


Phyiical  Vrlttca 

Cort  CMvral  ritaatt/  Raov- 

Ttrkaical  SoHlarlaf  Bflort/  Feraonal  Hilltary  ladga  Rating  HIC 

rrOarioa  Naaaara*  Prollciaary  Proflcitacf  Laadarahip  Diacipliaa  Baariap  Taata  Scalaa  Qaallf icatioa 


I  «  «  « 

V  c 

1  A  O 

AAA 
9  9  9 


U 

«  l» 
A  • 

C'  A 
C  ^ 
O 

!«£ 


kp  O 

•  ► 


A  A 

a  a 

Vil  V 


,r  I 


a  a  a 

V  •©  ' 


•  V 

V 


a  9  V  4a  A  ^ 

A  a  -*  ft  4*  • 


ft  e 


>  »  M  !*• 


V  4i  A  4* 

SS'-i 

ft  «  A  ft 

»  M  VI 


O 
m  \ 


•  ^  n 


M  K  M  K  »C 


4a  ft  M 
ft  A  ft 


««  ft-  ft 
e  '**»  4a 
«•  ft  M  *  9 

•  ft  «l  ft  VI 
ft  •«  ft  ««  ft 

H«  ft  W  ft  4a  ^ 

ft  9  ft  ft  A  •> 

A  a  ft  H  ft 

ft  a  A  a 

ft  O  *  ft  A  ft 

►•ft  »  C  ••  V» 


VI  VI  %•.  %n  v; 


ft  ^  A  4 

fit 

c  S  -■ 

O  w  r  < 

ft  < 
ft  «  —  • 

e  A  ft  I 

■s 

e  o  ft  * 


fta 

a 


Aft  « 

*a  A  ^ 


ft  ft 

9  ka 

ft  O 

ft  ftd 

ft 


ft  «. 

a 

ft  ft  e  ^ 

ft  ft  c 

ft  ft  fta 
A  ft  ft 
•'CO 


^  •  r 

s  c  ft  c 

ft  «  ft  • 
^  V 

4  ft 


9  ft 
9  ft 


a  A 

to 

c  c 


ft  ft 

e 

ft 


>  ft  e 
—  C 
ft  u 

>  a 

ft  ft 

K  4i 

ft  ft 


1.  _ 

S  C 


I  ^ 

»  ^  . 

1  ft  ' 

A  • 

I  r 


•4 

a 


ft 

A  ft 

•a  V 


fta  «.  VI 
t 

A  A 
« 

C  ft 


ft  a 

ft  c 
ft  a 

A  a 

c 

^  w 

c 

e  t 


k.  « 


e  ft  e  ^ 


r  ?  ^  2 


ft  e 

—  V 

ft 

^  ^  r  *a 

nrft  — 
s  a  x*  A  VI 


ft  C 

ft  ft 
V  ft 


«a  A 

ft  . 

C  M 


^  V  ft  c  c 

»  A  b-  A 


278 


1  i 


i  •  «  i  1 


-  |! 

’ji 

% 

1 


3  ! 


£  2  s  5  i 

•  •  •  •  « 

•  •  •  • 

s  a  I" 

•  •  • 


S  I  1  S 


a  a 


!  3  2 


9  S  S  S 


99113 

*  •  •  •  • 


9  9  S 


S  19  Si  9  9  3  i  2 


9  9  9  3  19  9  9  9 


9339S93999 


1  9  1  3  9  9  1  9  9  9  2 


i!-i  Sifj 

»;j!i  !»;i 

I  I  f  I  5  ,  i  I  f  f 

i  I 

i  I !  I  n  i ;  1 1 

I  ll  I  I  I  I  i  I  I  I 


!  •  ;  I 

1  2  «  ^ 
I  f  j  * 


|p|WJ>  awiMt/MMtM  ttMII 


PRELIMINARY  MODEL  OF  ENLISTED  JOB  PBRFORNAMCB 


FifMTC  2 

FunctioMi  Catesorles  by  Job  0*4  Method 


3i.  fiCMcrol  Nedicol  KnoMledge 
37.  hosfiondins  l«  Alonis 
34.  Conduct  NT  Procedures 
39.  Petrol  Duties 
91.  Power  Irel*  end  Clutch 


Figure  3 


SAMPLE  rUMCTZOMAL  DUTY  CLUSTER  DEFZMZTZOMS 


First  Aid 


Consists  of  itsBS  whose  primary  purpose  is  to 
indicate  Icnowledge  about  how  to  sustain  life, 
prevent  health  complications  caused  by  trauma  or 
environmentally  induced  illness,  including  the 
practice  of  personal  hygiene.  Includes  all  related 
diagnostic,  transportation,  and  treatment  items 
except  those  items  normally  performed  in  a  patient 
care  facility.  Includes  items  related  to  safety 
and  safety  hazards. 


Mavlgate 


Consists  of  items  whose  primary  purpose  is  to 
indicate  knowledge  about  how  to  plan  or  execute 
movement  between  points  over  unknown  terrain  either 
cross-country  or  using  road  networks,  or  identify 
the  location  of  objects.  Includes  all  means  of 
determining  direction,  distances,  and  locations 
using  maps  of  all  types,  overlays,  compasses, 
terrain,  celestial  objects,  and  field  expedients. 


NEC 

Consists  of  items  whose  primary  purpose  is  to 
indicate  knowledge  about  performance  when  nuclear, 
biological,  or  chemical  contaminants  and  threats 
are  present,  planned,  detected,  or  expected. 
Includes  maintenance  and  operation  of  clothing, 
gear,  and  equipment  whose  primary  purpose  is  to 
counter,  protect,  or  detect  NBC  threats.  Includes 
NBC  markers.  Does  not  include  first  aid  treatment 
of  contamination. 

Weapems 


Consists  of  items  whose  primary  purpose  is  to 
indicate  knowledge  about  maintenance,  preparation, 
and  firing  of  small  arms.  Small  arms  are  defined 
as  sized  weapons,  including  automatic  weapons,  up 
to  and  including  caliber  .60  and  shotg\ins.  Includes 
ancillary  sighting  systems  and  techniques,  stands 
and  moxints,  zeroing  and  techniques  of  fire.  Excludes 
firing  from  aircraft  and  vehicles  where  the  weapon 
is  fired  by  electrical/hydraulic  aiming/ firing 
systems  and  sighting  systems  that  are  part  of  the 
aircraft/vehicle  and  not  part  of  the  weapon. 


282 


Figure  4 


DBTZHXTZOHS  07  TBZ  TESJORMAliaS  CONSTRUCTS 


•  Core  T«chnleal  Proficiency 

This  performance  construct  represents  the  proficiency 
with  which  the  soldier  performs  the  taslcs  that  are 
"central”  to  the  KOS.  The  taslcs  represent  the  core  of 
the  job  and  they  are  the  primary  definers  of  the  MOS. 

For  example,  the  first  tour  Armor  Crewman  starts  and 
stops  the  tank  engines;  prepares  the  loader's  station; 
loads  and  unloads  the  main  gun;  boresights  the  M60A3; 
engages  targets  with  the  main  gun;  and  performs  misfire 
procediires.  This  performance  construct  does  not  include 
the  individual's  willingness  to  perform  the  task  or  the 
degree  to  which  the  individual  can  coordinate  efforts 
with  others.  It  refers  to  how  well  the  individual  can 
execute  the  core  technical  tasks  the  job  requires,  given 
a  willingness  to  do  so. 


e  General  Soldiering  Proficiency 

In  addition  to  the  core  technical  content  specific  to  an 
MOS,  individuals  in  every  MOS  also  are  responsible  for 
being  able  to  perform  a  variety  of  general  soldiering 
tasks  — -  for  example,  determines  grid  coordinates  on 
military  maps;  puts  on,  wears  and  removes  M17  series 
protective  mask  with  hood;  determines  a  magnetic  azimuth 
using  a  compass;  collects/reports  information  -  SALUTE; 
and  recognizes  and  identifies  friendly  and  threat 
aircraft.  Performance  on  this  construct  represents 
overall  proficiency  on  these  general  soldiering  tasks. 
Again,  it  refers  to  how  well  the  individual  can  execute 
general  soldering  tasks,  given  a  willingness  to  do  so. 


e  Effort  and  Leadership 

This  performance  construct  reflects  the  degree  to  which 
the  individual  exerts  effort  over  the  full  range  of  job 
tasks,  perseveres  under  adverse  or  dangerous  conditions, 
and  demonstrates  leadership  and  support  toward  peers. 

That  is,  can  the  individual  be  counted  on  to  carry  out 
assigned  tasks,  even  under  adverse  conditions,  to  exercise 
good  judgment,  and  to  be  generally  dependable  and 
proficient?  While  appropriate  knowledges  and  skills  are 
necessary  for  successful  performance,  this  construct  is 
only  meant  to  reflect  the  individual's  willingness  to  do 
the  job  required  and  to  be  cooperative  and  supportive 
with  other  soldiers. 


283 


Figure  4 
(continued) 


PTSonal  Dl»ciplin» 

This  psrforaancs  construct  raflacts  the  degras  to  which 
ths  individual  adharas  to  Army  ragulations  and  traditions, 
axarcisas  parsonal  salf-control,  daaonstratas  intagrity 
in  day-to-day  bahavior»  and  doas  not  craata  disciplinary 
problams.  Paopla  who  rank  high  on  this  construct  show  a 
conaitmant  to  high  standards  of  parsonal  conduct. 


Physical  Fitnass  and  Military  Baarlna 
This  parformanca  construct  raprasants  tha  dagraa  to 
which  tha  individual  maintains  an  appropriata  military 
appaaranca  and  baaring  and  stays  in  good  physical 
condition. 


PREDICTING  SOLDIER  PERFORMANCE: 
ASSESSMENT  OF  TEMPERAMENT  CONSTRUCTS  AS 
PREDICTORS  OF  JOB  PERFORMANCE 


Leaetta  M.  Hough 
Steven  D.  Ashworth 

Personnel  Decisions  Research  Institute 


Presented  at  the  Annual  Conference  of  the 
Society  for  Industrial  and  Organizational  Psychology 

Atlanta,  Georgia 

April  1987 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


285 


Predicting  Soldier  Performance:  Assessment  of 

Temperament  Constructs  as  Predictors  of  Job  Performance 

Personnel  Decisions  (PDRI),  along  with  Human  Resources  Research 
Organization  (HumRRO)  and  the  American  Institutes  for  Research  (AIR), 
has  been  Involved  In  research  with  the  Army  Research  Institute  (ARI)  to 
augment  the  prediction  of  job  performance  of  enlisted  Army  personnel. 

It  is  a  major  project  involving  four  research  institutes,  several  mil¬ 
lion  dollars,  several  years  of  effort,  and  thousands  of  soldiers.  I’m 
going  to  talk  about  a  small  part  of  that  project- -the  development  and 
validation  of  temperament  predictors,  our  strategy  and  results  obtained. 

The  topics  that  I’m  going  to  cover  today  are: 

I.  Literature  Review 

A.  Strategy 

6..  Results 

II.  Development  and  Evaluation  of  Temperament  $cales--"Assessment 

of  Background  and  Life  Experiences"  (ABLE) 

A.  Target  constructs 

B.  ABLE  scale  characteristics 

C.  ABLE  factor  structure 

D.  Definition  of  criterion  composites 

E.  Zero-order  validities  of  temperament  scales 

F.  Contribution  compared  to  other  predictors 

III.  Evaluation  of  Response  Validity  Scales 

A.  "Non-Random  Response"  scale 

B.  "Self-Knowledge"  scale 

C.  "Social  Desirability"  scale 

D.  "Poor  Impression" 


286 


Literature  Review 


The  project  began  in  1982,  about  five  years  ago,  with  a  thorough 
literature  review  to  identify  potentially  useful  predictors  of  criteria 
important  to  the  Army. 

Our  approach  was  construct-oriented  for  both  predictors  and  cri¬ 
teria.  Thus,  we  needed  a  classification  strategy  or  taxonomy  for  both 
predictors  and  criteria. 

Predictor  and  criterion  taxonomies.  Our  classification  system  for 
criteria  was:  (1)  educational,  (2)  training,  (3)  job  involvement,  (4) 
job  proficiency,  and  (5)  adjustment.  Within  each  of  these  broad  taxon¬ 
omies  we  had  subcategories  such  as  supervisory/teacher  ratings,  GPA, 
etc.  For  the  predictors,  we  started  with  the  structure  initially  found 
by  Tupes  and  Christal  (1961)  in  the  early  60s  in  their  factor  analysis  of 
peer  ratings.  These  factors  were  essentially  replicated  by  Norman  (1963) 
in  his  work'  with  peer  ratings  of  temperament.  These  five  factors  are  what 
is  being  referred  to  today  as  the  "Big  Five."  Following  Hogan’s  thinking 
(at  that  time  when  we  did  our  literature  review)  we  separated  "Affiliation" 
from  the  "Surgency"  construct.  Thus,  our  taxonomy  consisted  of  the  fol¬ 
lowing:  (1)  Surgency,  (2)  Affiliation,  (3)  Adjustment,  (4)  Agreeableness, 
(5)  Dependability,  and  (6)  Intellectance. 

Categorization  of  temperament  scales.  Once  we  had  a  taxonomy,  our 
next  step  was  to  categorize  the  existing  temperament  scales  into  the  clas¬ 
sification  scheme.  From  articles  and  manuals,  we  obtained  hundreds  of 
correlations  between  temperament  scales.  We  categorized  the  temperament 
scales  into  the  six  categories,  plus  a  seventh  miscellaneous  category,  and 
then  refined  the  classifications  through  an  iterative  process  of  classi¬ 
fying  and  reclassifying  temperament  scales  to  maximize  the  mean  within- 


287 


category  correlations  and  minimize  the  mean  between-category  correlations. 
The  results  of  this  bootstrapping  process  is  shown  in  Table  1.  The  circles 
in  the  diagonal  show  the  mean  within-category  correlations.  As  can  be 
seen,  they  are  in  the  .30s  and  .40s  and  are,  in  all  cases,  higher  than  the 
mean  between-category  correlations.  Mean  correlations  in  the  .30s 
and  .40s,  however,  suggest  that  the  categories  are  not  all  that  homogen¬ 
eous.  We  could  have  increased  the  mean  within-category  correlations  by 
putting  more  scales  in  the  miscellaneous  category;  that,  however,  would 
have  defeated  our  purpose  of  summarizing  criterion-related  validities  ac¬ 
cording  to  constructs. 

Meta  analysis  of  criterion-related  validities.  Our  next  step  was  to 
summarize  the  criterion-related  validities  according  to  these  predictor  and 
criterion  constructs.  This  next  page  of  your  handout.  Table  2,  shows  the 
results.  It  is  a  meta  analysis  of  the  criterion-related  validities  of 
scales  wittin  each  predictor  construct  for  each  criterion  construct.  As 
you  can  see,  several  constructs  correlate  with  the  various  criteria.  Note 
that  there  are  three  additional  predictor  constructs.  These  three, 
"Achievement,"  "Masculinity,"  and  "Locus  of  Control,"  were  all  a  part  of 
the  miscellaneous  category.  When  we  summarized  the  validities  for  the 
miscellaneous  category,  we  found  respectable  validities  there  too,  so  we 
looked  more  closely  at  the  scales  included  in  the  miscellaneous  category 
and  found  these  additional  three  constructs.  We  summarized  the  validities 
separately  for  these  three  constructs.  Thus,  in  terms  of  criterion- 
related  validities,  the  five  basic  constructs  did  not  appear  to  cover  the 
domain. 

The  results  in  this  table  are  different  from  the  results  that  Guion 
and  Cottier  published  in  their  1965  Personnel  Psychology  article;  their 
conclusions  were  quite  discouraging.  They  concluded  that  temperament  vari- 


288 


ab1es  have  validity  more  often  than  can  be  expected  by  chance,  but  that  no 
generalized  principles  can  be  discerned  from  the  overall  results.  We 
believe  that  our  strategy  of  summarizing  the  validities  according  to  both 
predictor  and  criterion  constructs  accounts  for  the  difference  in  results. 
The  constructs  provide  the  "generalized  principles."  To  test  this  hypo¬ 
thesis,  we  summarized  the  validity  coefficients  in  our  database  without 
regard  to  construct  and  obtained  a  coefficient  of  essentially  zero,  quite 
different  from  the  coefficients  in  Table  2.  We  believe  this  demonstrates 
the  importance  of  constructs  as  organizing  principles  for  examining  and 
understanding  the  literature  on  the  criterion-related  validity  of  tempera¬ 
ment  variables. 

The  validities  in  Table  2  are  more  similar  to  the  results  published  by 
Ghiselli  in  his  1966  book  Validity  of  Occupational  Aptitude  Tests. 

Ghiselli,  however,  summarized  validities  only  for  those  temperament  scales 
that  he  evaluated  as  pertinent  to  a  particular  occupational  category.  We 
believe  that  summarizing  validities  according  to  constructs  enabled  us  to 
arrive  at  conclusions  similar  to  Ghiselli’s. 

Development  of  ABLE^  Scales 

The  next  major  task  for  us  was  to  develop  scales  that  would  measure 
variables  identified  during  the  literature  review  as  likely  to  predict 
criteria  important  to  the  Army.  The  next  page  of  your  handout.  List  1, 
lists  the  substantive  scales  we  developed  for  each  construct.  We  developed 
substantive  scales  for  six  constructs:  (1)  Surgency,  (2)  Adjustment,  (3) 
Agreeableness,  (4)  Dependability,  (5)  Achievement,  and  (6)  Locus  of  Control. 


^  "Assessment  of  Background  and  Life  Experiences." 

289 


We  also  developed  a  "Physical  Condition"  scale  to  measure  physical  condi¬ 
tion  and  four  response  validity  scales:  (1)  Non-Random  Response,  (2)  So¬ 
cial  Desirability,  (3)  Poor  Impression,  and  (4)  Self-Knowledge.  We  devel¬ 
oped  the  "Non-Random  Response"  scale  because  we  were  concerned  that  some 
participants  would  complete  the  inventory  carelessly  because  the  data  were 
gathered  for  "research  purposes  only."  The  "Non-Random  Response"  scale  was 
developed  to  detect  such  inventories.  We  were  also  concerned  about  self¬ 
descriptions  that  were  intentionally  distorted  and  wanted  to  be  able  to  (1) 
detect  intentional  distortion,  and  (2)  develop  a  strategy  for  dealing  with 
distorted  self-descriptions.  Thus,  we  developed  the  "Social  Desirability" 
scale  to  detect  intentional  distortion  in  an  applicant  setting  (non-draft 
setting)  and  "Poor  Impression"  to  detect  intentional  distortion  in  a  draft 
setting.  We  developed  a  "Self-Knowledge"  scale  because  the  literature 
suggests  that  people  who  know  themselves  well  provide  more  accurate  self 
descriptions,  and  this  greater  accuracy  moderates  the  correlation  between 
self  description  and  descriptions  or  ratings  made  by  others  (Gibbons,  1983; 
Markus,  1983).  We  hypothesized  that  "Self-Knowledge"  might  moderate  the 
relationships  between  ABLE  substantive  scales  and  job  performance  criteria. 
In  short,  we  developed  four  response  validity  scales  to  measure  accuracy  of 
self-descriptions  in  order  to  test  the  hypothesis  that  accuracy  of  self¬ 
description  moderates  the  criterion-related  validities  of  ABLE  substantive 
scales. 

Evaluation  of  ABLE  Substantive  Scales 

Once  the  ABLE  temperament  scales  were  developed  and  pretested,  predic¬ 
tor  and  criterion  data  were  gathered  during  the  summer  and  fall  of  1985 
from  over  9,000  soldiers.  The  scale  statistics  for  the  temperament  inven¬ 
tory,  entitled  "Assessment  of  Background  and  Life  Experiences"  (ABLE), 


290 


appear  on  the  next  page  of  your  handout.  Table  3.  The  average  number  of 
Items  In  a  scale  is  15.  The  median  alpha  of  the  substantive  scales  is  .81. 
Table  4  summarizes  the  ABLE  substantive  scale  statistics  and  the  correla¬ 
tions  of  the  ABLE  substantive  scales  with  each  other  and  with  other  compon¬ 
ents  of  the  four-hour  predictor  battery.  As  can  be  seen,  the  only  part  of 
the  predictor  battery  that  the  ABLE  substantive  scales  correlate  with  in  a 
substantial  way  are  other  ABLE  substantive  scales.  The  ABLE  substantive 
scales  appear  to  be  tapping  a  part  of  the  predictor  domain  not  tapped  by 
other  measures. 

The  next  page  of  your  handout.  Table  5,  shows  the  structure  of  the 
ABLE  substantive  scales.  Three  factors.  Ascendancy,  Dependability,  and 
Adjustment,  emerged.  The  scales  designed  to  measure  achievement  loaded  on 
the  same  factor  as  the  scale  designed  to  measure  Surgency.  The  literature 
review  indicated  that  measures  of  Achievement  and  Surgency  are  not  highly 
intercorrelated.  Unfortunately,  the  ABLE  scales  do  not  appear  to  capture 
the  uniqueness  of  the  two  constructs. 

Criterion-Related  Validities.  The  criterion  measures,  the  development 
of  which  was  a  major  part  of  the  research  project,  were  developed  by  a 
different  part  of  the  research  team.  The  criterion  composites,  which  they 
also  developed,  are  very  briefly  described  in  the  next  page  of  your  hand¬ 
out,  List  2.  There  are  five  composites:  (1)  Core  Technical  Proficiency, 
(2)  General  Soldiering  Proficiency,  (3)  Effort  and  Leadership,  (4)  Personal 
Discipline,  and  (5)  Physical  Fitness  and  Military  Bearing.  The  first  two 
consist  mainly  of  hands-on  tests  (work  samples)  and  knowledge  tests.  The 
other  three  consist  of  supervisory  and  peer  ratings  and  information  that 
can  be  obtained  from  personnel  records. 

The  next  page  of  your  handout.  Table  6,  shows  the  criterion-related 


291 


validities  of  the  ABLE  scales  for  these  five  criteria.  The  scales  are 
organized  according  to  the  literature  review  taxonomy  of  temperament  con¬ 
structs.  The  results  suggest  that  "Achievement"  scales  are  the  best  pre¬ 
dictors  of  the  "Effort  and  Lea(lership"  criterion;  "Dependability"  scales 
are  the  best  predictors  of  the  "Personal  Discipline"  criterion;  and  "Physi¬ 
cal  Condition"  is  the  best  predictor  of  the  "Physical  Fitness  and  Military 
Bearing"  criterion,  though  the  "Achievement"  scales  also  correlate  with 
this  criterion.  These  three  criteria,  which  the  ABLE  substnative  scales 
predict,  include  the  supervisory  and  peer  rating  criteria.  The  other  two 
criteria,  which  consist  of  hands-on  and  knowledge  tests,  are  not  predicted 
with  the  ABLE  substantive  scales.  The  other  finding  to  which  I’d  like  to 
draw  your  attention  is  that,  except  for  "Poor  Impression,"  the  response 
validity  scales  do  not  correlate  with  the  supervisory  and  peer  rating 
criteria.  This  finding  will  be  relevant  later  when  we  analyze  the  response 
validity  scales  in  detail. 

The  next  page  of  your  handout.  Table  7,  shows  the  criterion-related 
validities  of  different  types  of  predictors--cognitive  ability,  spatial 
ability,  perceptual /psychomotor  ability,  work  environment  preferences, 
temperament,  and  interests--included  in  the  study.  It  shows  the  multiple 
correlation  of  each  type  of  predictor  with  each  of  the  five  criteria.  As 
can  be  seen,  the  best  predictors  of  the  Effort  and  Leadership,  Personal 
Discipline,  and  Physical  Fitness  and  Military  Bearing  criteria  are  the  ABLE 
substantive  scales.  This  finding  is  not  surprising,  given  the  literature 
review  and  the  results  that  showed  that  the  ABLE  substantive  scales  tap  an 
independent  part  of  the  predictor  domain. 

Evaluation  of  Response  Validity  Scales 

Recall  that  we  hypothesized  that  accuracy  of  self-description  moder- 


292 


ated  the  criterion-related  validities  of  the  ABLE  substantive  scales  and 
that  we  develped  four  response  validity  scales  to  detect  four  different 
types  of  inaccurate  self-descriptions. 

"Non-Random  Response"  scale.  To  evaluate  the  usefulness  of  the  "Non- 
Random  Response"  scale,  we  examined  its  moderating  effect  on  the  validities 
of  the  ABLE  substantive  scales.  We  split  the  sample  into  two,  one  group 
which  scored  low  on  "Non-Random  Response”  scale  was  designated  as  "random 
responders,"  the  remaining  sample  was  designated  as  "non-random  re¬ 
sponders."  Ue  performed  a^spl it-group  analysis  rather  than  a  moderated 

/ 

regression  because  the  variable  of  interest  had  a  highly  skewed  distribu¬ 
tion.  The  results,  which  are  shown  in  Table  9,  indicate  that  random  re¬ 
sponding  does,  indeed,  moderate  the  criterion-related  validities  of  the 
ABLE  substantive  scales.  Though  the  validities  of  the  ABLE  substantive 
scales  are  not  uniformly  zero  for  the  random  responders,  typically  the 
validities-are  significantly  lower. 

"Self-Knowledge"  scale.  Previous  research,  as  mentioned  earlier,  has 
shown  that  self-knowledge  moderates,  the  correlation  between  self-descrip¬ 
tion  and  descriptions  or  ratings  provided  by  others.  We  examined  the 
extent  to  which  self-knowledge  moderates  the  relationship  between  self¬ 
descriptions  as  measured  by  the  ABLE  substantive  scales  and  job  performance 
criteria.  For  each  ABLE  substantive  scale,  we  computed  (1)  the  zero-order 
correlation  with  each  criterion,  (2)  the  multiple  correlation  of  the  sub¬ 
stantive  scale  and  "Self-Knowledge"  scale  with  each  criterion,  and  (3)  the 
multiple  correlation  based  on  moderated  regression.  In  moderated -regres¬ 
sion  analysis,  the  multiple  correlation  is  incremented  by  an  interaction 
term,  in  this  case  the  interaction  between  "Self-Knowledge"  and  the  partic¬ 
ular  ABLE  substantive  scale.  We  compared  these  three  coefficients  for 
each  substantive  scale.  If  Self-Knowledge  moderates  the  criterion-related 


293 


validities  of  the  substantive  scales,  the  value  of  the  moderted  multiple 
correlation  coefficient  would  be  greater  than  the  multiple  correlation.  If 
the  values  are  similar,  the  relationship  is  not  moderated;  a  linear  model 
accounts  for  as  much  of  the  variance  as  the  non-linear  model.  If  both 
values  are  similar  to  the  zero-order  correlation  of  the  substantive  scale 
with  the  criterion,  then  the  Self-Knowledge  scale  increments  the  validity 
in  neither  a  linear  nor  a  non-linear  (moderated)  fashion.  As  is  shown  in 
Table  10,  the  "Self-Knowledge"  scale  contributes  nothing,  in  either  a 
linear  or  non-linear  way,  to  the  prediction  of  the  criteria. 

"Social  Desirability"  and  "Poor  Impression"  scales.  The  same  logic 
that  applies  to  the  "Non-Random  Response"  scales  applies  to  the  "Social 
Desirability"  and  "Poor  Impression"  scales,  that  is,  accuracy  of  self¬ 
description  moderates  the  relationship  between  ABLE  substantive  scales  and 
job  criteria.  First,  though,  we  wanted  to  know  if  "Social  Desirability" 
and  "Poor  Impression"  detected  intentional  distortion. 

To  learn  whether  the  "Social  Desirability"  and  "Poor  Impression" 
scales  detected  intentional  distortion,  we  conducted  a  faking  study  with 
245  soldiers  in  which  we  instructed  them  to  respond  honestly,  to  fake  good, 
and/or  to  fake  bad.  Table  11  summarizes  the  results  of  that  study. 

Clearly,  soldiers  were  able  to  distort  their  self-descriptions  when  in¬ 
structed  to  do  so.  The  median  effect  size  for  change  in  ABLE  substantive 
mean  scale  scores  in  the  honest  and  fake  good  conditions  was  approximately 
half  a  standard  deviation.  The  median  effect  size  for  change  in  ABLE 
substantive  mean  scale  scores  in  the  honest  and  fake-bad  conditions  was 
over  two  standard  deviations.  Fortunately,  the  "Social  Desirability"  scale 
detected  faking  good--it  changed  approximately  one  standard  deviation--and 
the  "Poor  Impression"  scale  detected  faking  bad--it  changed  over  two-and-a- 


294 


half  standard  deviations. 

We  then  examined  the  extent  to  which  the  "Social  Desirability"  scale 
moderated  the  criterion-related  validities  of  the  ABLE  substantive  scales. 
Unfortunately,  we  had  criterion  data  for  only  a  few  of  the  soldiers  in  the 
faking  study.  Those  data  would  have  been  the  best  to  examine  because  we 
knew  their  motivation--they  were  faking  good.  We  turned  to  the  next  best 
set  of  data--the  concurrent  validation  sample--to  answer  the  question. 
Again,  our  variable  had  a  highly  skewed  distribution,  so  we  used  a  split- 
group  technique  to  investigate  the  moderating  effects  of  "Social  Desira¬ 
bility."  We  split  the  group  in  two.  We  chose  the  cutting  point  to  be 
approximately  the  mean  of  the  fake  good  group  of  the  faking  study.  Thus, 
the  "high"  group  scored  approximately  at  or  above  the  mean  of  a  group  known 
to  be  faking.  Table  12  shows  the  results.  The  "Social  Desirability"  scale 
does  moderate,  slightly,  the  validities  of  the  ABLE  substantive  scales. 
Interestingly,  the  validities  for  the  "Personal  Discipline"  criterion  are 
least  affected. 

Recall  that  the  "Poor  Impression"  scale  correlated  with  the  criteria; 
thus,  it  was  inappropriate  to  investigate  its  moderating  effects  on  the 
validities  of  the  other  ABLE  scales,  at  least  in  the  present  sample.  We, 
therefore,  examined  the  contribution  of  the  "Poor  Impression"  scale  as  an 
independent  predictor  in  a  linear  model.  Your  next  handout,  Table  13, 
shows  the  zero-order  correlation  of  each  of  the  ABLE  substantive  scales 
with  each  criterion,  as  well  as  the  multiple  correlation  of  each  ABLE 
substantive  scale  when  "Poor  Impression"  is  included  as  an  independent 
predictor.  As  can  be  seen,  the  "Poor  Impression"  scale  does  increment  the 
validities  of  the  ABLE  substantive  scales. 


295 


Sunwiary 


To  summarize  the  results  of  our  work: 

1.  Constructs  provide  a  strategy  to  make  sense  of  the  llteraturii  on 
the  criterl on -related  validities  of  temperament  scales; 

2.  The  "Big  Five"  as  a  taxonomy  results  In  quite  a  heterogeneous 
grouping  of  scales,  though  they  do  highlight  constructs  that  are 
"good  bets"  for  predictors; 

3.  Temperament  variables  measure  a  part  of  the  predictor  domain  un¬ 
tapped  by  most  other  types  of  predictors; 

4.  Temperament  variables  predict  certain  kinds  of  criteria,  criteria 
that  are  not  well -predicted  by  most  other  predictors; 

5.  "Self-Knowledge"  does  not  appear  to  affect  the  relationship  between 
other  temperament  variables  and  job  performance  criteria;  and 

6.  The  response  validity  scales  that  detect  random  responding  and  self 
descriptions  that  are  overly  positive  or  negative  can  be  used  to 
Increment  the  validities  of  temperament  variables  for  job  perfor¬ 
mance  criteria. 


296 


References 


Gh1se111,  E.  E.  (1966).  Validity  of  occupational  aptitude  tests.  New 
York:  John  Wiley  &  Sons,  Inc. 

Gibbons,  F.  X.  (1983).  Self-attention  and  self-report:  The 

"veridical ity"  hypothesis.  Journal  of  Personality,  51,  517-542. 

Guion,  R.  M.,  &  Gottier,  R.  F.  (1965).  Validity  of  personality  mea¬ 
sures  in  personnel  selection.  Personnel  Psychology,  18,  135-164. 

Markus,  H.  (1983).  Self-knowledge:  An  expanded  view.  Journal  of 
Personality,  51,  543-565. 

Norman,  W.  T.  (1963).  Toward  an  adequate  taxonomy  of  personality  attri¬ 
butes:  Replicated  factor  structure  in  peer  nomination  personality 
ratings.  Journal  of  Abnormal  and  Social  Psychology,  66,  574-583. 

Tupes,  E.  C.,  i  Christal,  R.  E.  (1961,  May).  Recurrent  personality 
factors  based  on  trait  ratings  (ASD-TR-61-97) .  Lackland  Air  Force 
Base,  TX:  Aeronautical  Systems  Divisions,  Personnel  Laboratory. 


297 


Table  1  Mean  Within-Category  and  Between  Category 
Correlations  of  Tenpereunent  Scales 


Siagtns/ 

SD^-.16 

AdjuacMnc 

M«an^>.20 

SD^-.ia 

M  -321 
r 

Maan^-(^ 

SD^-.19 

M  -16S 
r 

Agracablanaaa 

M«an^-.04 

SD^-.17 

Maan^«.24 

SD  -.16 
r 

M  -162 
r 

SD^-.14 

44 

Dapcndabilicr 

Mean  —.08 
r 

SD^-  .16 

H  -  286 
r 

Mean  «al3 
r 

SD  -.20 
r 

N  -276 
r 

Maan^-.06 

SD^-.17 

M^-166 

Mean^«^3^ 

SD  -.18 
r 

N  -121 
r 

Xacailaecanc* 

Me4n^-.12 

SO^-.IS 

N^-175 

Mean  -.02 
r 

SD  -.14 
c 

H  -193 
r 

Maan^-.04 
S0^-.16 
K^-  9* 

Mean^— .12 

SD^-  .18 

N  •  162 
r 

He  an  ^*^4^ 

SD  -.19 

Mcan^-.09 

Mean  -.00 

Mean  -.08 
r 

Mean  —.14 
r 

Mean^.^^ 

SD^-.16 

V  « 

Mllliaeion 

SD  -.21 
r 

H|^-157 

SO  -.16 
r 

N  -130 
r 

SD^-.16 

H^-160 

SD^-  .15 

H^-  84 

Mean^-.09 

Mean ^-.12 

Haan^-.02 

Mcan^-.02 

Mean  -.04 
r 

Kisccllanaous 

S0^-.17 

S0^-.18 

SD  -.18 
r 

SD^-.17 

B99 

M^-392 

M^-419 

M^-361 

N  -242 
r 

mgm 

■fflM 

Surgency 

Adjuscnenc 

Aer<«able- 

ness 

Dcpenii.-i- 

blllty 

Incellee- 

canca 

Affilia¬ 

tion 

Hlae.L- 

lanaoua 

298 


Table  2  Meta  Analysis  of  Criterion-Related  Validity  Studies  ^ 
^^t  Iteed  Tenperament  n»3ictars 


Achievement 

8 

0  ‘ 

.33 

A 

.24 

Masculinity 

8 

E0  ’ 

.09 

10 

.10 

Locus  of  Control 

1 

.32  2 

.29 

7 

.25 

TiM  Period  1960- 19B4. 

^  A  star  denotes  the  construct  is  one  of  the  "Big  Five*  constructs. 

Note:  Correlations  ere  not  corrected  for  leireliability  or  range  restrictions. 


299 


list  1 


ABLE^  Scales  Organized  According  to  Construct  Intended  to  Measure 


SUBSTANTIVE  SCALES: 


Suroencv 

.  Dominance 
.  Energy  Level 

Adjustment 

.  Emotional  Stability 

Aqreeableness  (Likeabilitvl 
.  Cooperativeness 

Dependability 

.  Nondelinquency 
.  Traditional  Values 
.  Conscientiousness 

Achievement 

Work  Orientation 
.  Self  Esteem 

Locus  of  Control 

.  Internal  Control 

Physical  Condition 

.  Physical  Condition 


RESPONSE  VALIDITY  SCALES: 

.  Non-Random  Response 
.  Social  Desirability 
.  Poor  Impression 
.  Self-Knowledge 


^  Inventory  developed  by  PDRI  for  the  Army  Research  Institute  entitled 
"Assessment  of  Background  and  Life  Experience." 


300 


Tedsle  3  ABIE  Scale  Statistics  for  Total  Groq 
(OcKicurrent  Sanple;  Revised  Trial  Ba 


o 

Ot 

\o 

a\ 

o 

• 

• 

• 

• 

• 

• 

• 

• 

a\ 

CO 

rH 

in 

^o 

<M 

CO 

<n 

r* 

M 

C4 

OM 

iH 

00 

in 

fS 

f» 

0> 

o 

00 

\o 

OI 

00 

ID 

in 

CO 

00 

00 

00 

00 

00 

00 

00 

301 


Table  4  ABLE  Substantive  Scales;  Sunnary 


302 


tM 

JSl  ri  m 
r* 


«  r» 


to  eo 


to  >1 

•H  V4 

to  0) 

>1  4J 

»H  +J 

(0  <0 

c  n 

< 

»H 

h  m 

O  -H 

•p  p 

o  H 

(0 

t««  *0 

<u 
to 

••  -H 

w  > 

Q)  0) 

«H  « 

A) 

O 

CO  Q) 


4J 

C 

<0 

4J 

to 

J3 

3 

to  CO 

«  u 

$  s 


^  Ir^  r» 

3  □  H  □ 


Id  M 

C  I 


■u 

•U  «t 

c  > 

Q)  0) 

*4 
P 

O  >, 


to 

o 

9 

to 

to 

>» 

p 

•H 

V 

o 

■p 

At 

>1 

c 

u 

X) 

> 

U 

u 

■u 

A) 

c 

3 

c 

p 

»-t 

0) 

o 

o 

to 

AI 

c 

& 

•H 

4J 

u 

o 

c 

C 

•H 

AI 

•H 

e 

AI 

c 

4J 

•H 

c 

o 

•H 

e 

u 

u 

p 

•o 

•o 

to 

0) 

p 

AI 

c 

c 

p 

o 

P 

o 

o 

c 

e 

u 

M 

M 

303 


Prijicipal  component  analysis,  varinax  rotation. 
Mote:  N  -  8367 


List  2 


Criterion  Composites^ 


Core  Technical  Proficiency  -  a)  hands-on  tests  of  MOS-specific  technical 
knowledge  and  skills;  and  b)  tests  of  school  and  job  knowledge. 


General  Soldiering  Proficiency  -  a)  hands-on  tests  of  general  soldiering 
skill;  and  b)  general  soldiering  knowledge  and  skill  test  items. 


Effort  &  Leadership  -  a)  supervisory  and  peer  ratings  of  effort  and 
leadership,  overall  effectiveness,  MOS  effectiveness  and  predicted  combat 
effectiveness;  and  b)  letters  and  certificates  of  commendation  and  other 
achievements. 


Personal  Discipline  -  a)  supervisory  and  peer  ratings  of  personal  control 
and  discipline;  and  b)  disciplinary  actions  and  other  negative  indicators 
in  personnel  files. 


Physical  Fitness  &  Military  Bearing  -  a)  supervisory  and  peer  ratings  of 
physical  fitness  and  military  bearing;  and  b)  physical  readiness  tests. 


^Data  gathered  at  same  time  as  Trial  Battery  was  administered,  i.e.,  summer 
and  fall  of  1985. 


Table  6  Validities  of  ABLE  Scales  for  Job  Performance  Criteria: 

Zero-Order  Correlations 

(Revised  Trial  Battery;  Concurrent  Validity  Study) 


Predictor 


Coro  Tochnicol 
Proficiency- 


Criterion 


Cenerot 

Soldiering  Effort  S 
Proficiency  Leadership 


Physical 
fitness  i 
Personal  Military 
Oiscipline  Searing 


aucqaney: 

OcBinanea  .01 

A^uavenartc: 

.  Solf  Estaem  .02 

.  Mack  Ocioncaclen  .02 

.  Biai^  tawal  .02 

Adjustmnt: 

.  Eawtionol  Stability  .02 

Agreeableness  (Likeability) 

.  Cooperativeness  .01 

Dependability: 

.  Traditional  Values  .03 

.  Non-delinquency  .03 

.  Cortscientiousness  .02 

Others: 

.  Internal  Control  .04 

.  Physical  Condition  *.04, 

Sesponse  Validity  Scales: 

.  Hon-RandoM  Response^  .13 

.  Social  Desirability  *.07 

.  Poor  lirpression  *.04 

.  Self-Knowledge  *.04 


.01 

.15 

.02 

.18 

.01 

.20 

.U 

.20 

.02 

.23 

.18 

.21 

.02 

.22 

.14 

.25 

.02 

.17 

.12 

.16 

.02 

.15  1 

1 

.06 

.13 

.25 

.16 

.07 

.12 

.29 

.14 

.02 

.18 

.23 

.22 

.03 

.13 

.13 

.13 

-.05 

.09  -.03 

E»] 

.14 

.07 

.10 

.02 

-.06 

.02 

.05 

.07 

-.05 

-.15 

-.15 

-.16 

-.03 

.07 

.05 

.13 

^Correlations  arc  based  on  unscreened  data  for  this  scale.  N  varies  from  0424  to  9322  (or  this 

scale. 

Note:  N  varies  from  7666  to  8477. 


Note:  A  box  indicates  notable  predictor/criterion  construct  relationships. 


305 


Table  7 


306 


307 


ts  from  659  tx>  675  for  group  soorira  lew  cm  ”Nbn-Randcin  Response”  sc»le 
!S  from  8336  to  8477  for  gxoi^  scoring  hi^  on  "Non-Random  Response"  scale 
Statistically  significant  differences  at  P  <  .05  is  approximately  .04. 


Table  10  Increrooital  Validities  of  ABLE  Scales  When  '‘Self-Khcwledge 
Scale  Is  Included  in  Predictor  Equation 
(Linear  ai>d  Non-Linear  Models; 


308 


Iht  *ul*lpl*  correlation  besed  on  nodereted  regression.  It  increnients  the 
(ultiple  eorreletion  by  Including  en  interaction  tern.  A  difference  betwen  R  and  R^j^ 
suggests  that  Self-Knowledge  moderates  the  validity. 


Table  11  Effect  Size  of  Differences  Between  Honest  and  Faking  Conditions  for 
ABLE  Response  Validity  Scales  and  Substantive  Scales 


•  • 
a 
e 


<0 

u 

(0 


m 

>1 

>1 

« 

<0 

•H 

(0 

u 

•o 

■H 

c 

ca 

•H 

rH 

o 

rH 

•H 

a 

C 

o 

(d 

Xi 

(0 

o 

H 

> 

> 

(0 

« 

-H 

-H 

u 

a 

n 

< 

o 

•H 

to 

O 

c 

to 

u 

e 

« 

CO 

<0 

c 

« 

o 

p 

•p 

o 

a 

TJ 

a 

Pi 

10 

a 

C 

e 

p 

ia 

n 

Id 

M 

3 

V 

<0 

Pi 

o 

to 

Pi 

•H 

1 

P 

M 

M 

u 

c 

0 

o 

o 

o 

o 

H 

CO 

2 

a 

a 

n 

CQ 

p. 

< 

< 

309 


12  Moderating  Effects  of  ^Social  [>esirabilit^  Scale  on  Correlations 
Between  ABIE  Scales  and  Job  Performance  Criteria 


310 


PROJECT  A  VALIDITY  RESULTS; 

THE  RELATIONSHIP  BETWEEN  PREDICTOR  AND  CRITERION  DOMAINS 


Jeffrey  J.  McHenry 
American  Institutes  for  Research 


Leaetta  M.  Hough 
Jody  L.  Toquam 
Mary  Ann  Hanson 
Steven  Ashworth 

Personnel  Decisions  Research  Institute 


Presented  at  the  Annual  Conference  of  the 
Society  for  Industrial  and  Organizational  Psychology 

Atlanta,  Georgia 

April  1987 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


313 


Project  A  Validity  Results: 

The  Relationship  between  Predictor  and  Criterion  Ooaalns 


The  purpose  of  this  paper  is  to  describe  the  relationship  between  the 
predictor  scores  described  In  the  Peterson,  Hough,  Ounnette,  Rosse,  Houston, 
Toquaa,  and  Wing  (1987)  and  the  criterion  scores  described  In  the  Campbell, 
Harris,  McHenry,  and  Arabian  (1987)  paper. 

This  paper  Includes  five  parts.  In  the  first  part,  we  describe  the 
creation  of  predictor  composite  scores  from  the  predictor  test  and  scale 
scores  described  In  Peterson  et  a1.  (1987).  In  the  second  part,  we  show  the 
relationship  between  the  predictor  composite  scores  within  each  predictor 
domain  and  the  five  job  performance  constructs  described  by  Campbell  et  a1. 
(1987).  In  the  third  part,  we  demonstrate  how  the  new  predictor  tests 
Increment  the  validity  of  the  Army*s  current  selection  battery,  the  Armed 
Services  Vocational  Aptitude  Battery  (ASVAB).  In  the  fourth  part,  we 
describe  the  relationship  between  the  new  predictor  tests  and  two  'method 
factors*  that  we  identified  In  our  analyses  of  the  job  performance  measures. 
Finally,  In  the  fifth  part,  we  discuss  how  the  predictor-criterion 
relationships  uncovered  In  the  validity  analyses  contribute  to  the 
understanding  of  job  performance  In  the  Army. 


Formation  of  Predictor  Composites 


The  preliminary  analyses  of  the  new  Project  A  predictor  tests  Indicated 
that  65  reliable  predictor  scores  could  be  computed  from  the  six  spatial 
tests,  the  10  computer  tests,  and  the  temperament/personal Ity,  vocational 
Interest,  and  job  reward  preference  Inventories  (Peterson  et  al.,  1987).  In 
addition,  scores  from  the  nine  ASVAB  subtests  were  available  from  Army 
records.  Table  1  shows  how  these  predictor  scores  were  distributed  among 
various  domains  within  the  predictor  space.  The  ASVAB  subtests  measured 
nine  cognitive  abilities.  The  spatial  tests  measured  six  different  aspects 
of  spatial  ability.  The  ten  computer  tests  yielded  20  measures  of 
perceptual -psychomotor  abilities.  The  ABLE  provided  measures  of  11 
temperaments/personal Ity  traits.  The  AVOICE  assessed  22  vocational 
Interests.  Finally,  the  JOB  measured  six  types  of  job  reward  preferences. 

There  were  several  problems  that  precluded  using  these  74  scores 
directly  In  the  Project  A  validity  analyses.  First,  as  Table  2  shows,  the 
number  of  subjects  with  complete  predictor  and  criterion  data  within  the 
nine  target  Project  A  jobs  ranged  from  2B9  for  Single  Channel  Radio  Operator 
to  597  for  Military  Police  (Young,  Harris,  Hoffman  &  Houston,  1987).  Even 
for  Military  Police,  the  ratio  of  subjects  to  variables  was  only  8:1.  Our 
Intent  was  to  use  multiple  regression  to  estimate  the  correlation  between 
the  predictors  and  job  performance  constructs.  This  ratio  Is  far  less  than 
the  ratio  of  10:1  that  many  statisticians  say  Is  the  minimum  required  to 
obtain  stable  estimates  of  multiple  regression  coefficients  and  the 
coefficient  of  multiple  correlation  £.  Since  we  were  faced  with  a  fixed 
number  of  subjects,  the  only  way  to  lo^rove  this  ratio  was  to  reduce  the 
number  of  predictor  scores. 

Second,  scores  from  many  of  the  predictor  tests  were  highly 


314 


Aggfl§gWgnt_Qf  the  Predictor  Space 


a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

a 

b 

b 

b 

b 

b 

irf 

s 

0 

0 

0 

0 

0 

0 

u 

0 

0 

0 

o 

0 

9 

0 

CO 

CO 

CO 

CO 

CO 

CO 

0 

0  ca 

a 

a 

a 

a 

a 

a 

44 

44 

44 

44 

U  9 

•04 

•04 

•H 

•04 

f  ** 

a 

a 

a 

a 

a 

a 

0 

5 

5 

5 

e 

5 

■  a 

a 

bL 

a 

a 

a 

& 

5  9 

a 

a 

a 

a 

a 

a 

X  a 

0 

0 

0 

0 

5 

9 

S 

u 

u 

u 

u 

u 

o 

o 

to 

• 

to 

rt 

• 

a 

A 

a 

a 

a 

b 

a 

a 

'  a 

a 

a 

0 

a 

a 

b 

b 

a 

0 

a 

b 

0 

0 

b 

^  • 

CO 

b 

0 

9 

9 

0 

0  0 

0 

0 

CO 

CO 

9 

.  CO  a 

A 

0 

CO 

n 

u  .  • 

a 

CO 

a 

a 

abb 

a 

44 

04 

04 

a 

£  0  0 

a 

a 

a 

a  0 

XI 

a 

a 

o 

9 

a 

9  4J  CO 

9 

a 

9* 

CO 

CO 

9 

z  a 

CO 

t* 

CO 

o 

04 

M 

at 

to 

ct 

04 

M 

to 

mi 

a 

« 

e 

S 

0 

9 

0 

A 

b  a 

a 

b 

oia 

M 

s 

a 

^  9 

O 

-  p 

<b 

0  e 

M 

>  ^ 

> 

a  a 

o 

b 

S  / 

b 

X  >04 

04  U  > 

e 

a  a 

X  / 

a 

b 

9  9  < 

9 

a  V 

44 

«4  a 

e  a 

•04 

0  <b 

44 

0  a 

9  b 

44 

a 

a 

a 

M 

a  e 

a  ^ 

>  a 

a 

X 

44  BO 

44  U  0 

44  a 

b 

fN 

e 

a  04 

e  o 

a 

a  a 

b 

f  ® 

9  44  44 

a  *9 

a 

CO  <9 

a 

a  «« 

0  a  a 

b 

9  OB 

a 

44 

•  rs 

>  a  c 

b 

9 

*9  <b  < 

9 

a  *9  H 

b 

O  M 

a 

a-^  > 

44 

a 

a  3 

>«  a  a 

e 

a 

a  ^  CO 

a 

a 

a  *9  a 

a  44  a 

A  a 

a 

b  a< 

a 

0 

a  e  < 

b  e  »< 

0  <-4 

X 

CO 

o 

•a  a 

<  M  ca 

•9  a 

a 

a 

a 

•04 

44 

44 

a 

a 

•04 

a 

> 

04 

b 

>• 

•04 

a 

44 

A 

44 

•04 

< 

e 

e 

V, 

M 

2* 

1  b 

44  >< 

a 

5 

A 

04  0 

e  44 

■9  a 

b 

U 

< 

a  44 

a-^ 

a 

b  9 

0 

9  0 

a  *-< 

e 

a  e 

*i  e 

^  >» 

0^ 

44  a 

a  a 

9 

9  a 

a  <b 

a 

a  0 

b  e 

a  b 

-4  a 

b-^ 

•04 

a  A 

a  0 

44 

X  a 

■D  a 

a  ^ 

44 

0  u 

a  a 

a 

tu 

a  0 

e  -0* 

a 

b  > 

a  b 

9 

A  a 

b  Q 

a  .a 

a 

a  a 

a  a 

0 

0  b 

Ot 

(9  < 

CO 

a  a 

a 

> 

fiu 

315 


'All  Measures  except  the  ASVAB  were  developed  specifically  for  Project  A 
The  ABLE  included  4  additional  response  validity  scales. 


Infantryman 
Cannon  Crtumambor 
Armor  CrtMman 

Singlo  Channel  Radio  Oporator 
Light  Uhtol  Vthlelt  Mechanic 
Motor  Transport  Operator 
Administrative  Specialist 
Medical  Specialist 
Military  Police 


316 


InttrcorrtUttd.  For  txanpit,  th«  avtrigo  Inttrcorrtlatlon  aaong  tho  six 
Projtct  A  spatial  tasts  was  .46.  This  aultlcolllntarlty  rasults  In  unstable 
estimates  of  multiple  regression  coefficients.  This  situation  can  be 
remedied  by  combining  the  correlated  test  scores  Into  a  single  composite. 

To  the  extent  that  the  tests  are  highly  Intercorrelated,  the  composite  score 
should  contain  all  of  the  reliable  variance  Included  In  any  of  the 
Individual  test  scores.  Also,  the  composite  should  be  more  reliable  than 
any  of  the  Individual  test  scores,  since  It  will  be  based  on  more  Items  than 
any  the  score  from  any  single  test. 

Because  of  these  two  problems,  the  74  predictor  test  and  scale  scores 
were  combined  Into  20  predictor  co^>os1tes  before  predictor-criterion 
relationships  were  explored.  With  one  exception  (which  will  be  noted 
below),  these  composites  were  formed  simply  by  summing  standardized  test  or 
scale  scores;  that  Is,  In  all  Instances  but  one,  unit  weights  were  used  to 
compute  composite  scores  from  test  and  scale  scores. 

Three  principles  were  used  to  guide  the  formation  of  composite  scores. 
First,  we  attempted  to  keep  the  number  of  composites  to  a  minimum.  We 
expected  that  this  would  Increase  the  stability  of  all  of  the  multivariate 
statistics  we  Intended  to  compute  In  exploring  predictor-criterion 
relationships.  Second,  we  sought  to  maintain  homogeneity  or  Internal 
consistency  within  composites.  To  guide  In  this  effort,  we  studied  the 
Intercorrelatlons  among  test  or  scale  scores.  We  also  used  principal 
components  analysis  to  Identify  tests  or  scales  with  similar  patterns  of 
factor  loadings.  Test  or  scale  scores  with  reasonably  high 
Intercorrelatlons  and  similar  patterns  of  factor  loadings  tended  to  be 
grouped  Into  the  same  composite.  We  believed  that  this  would  eliminate  any 
problems  usoclated  with  predictor  muUlcot linearity.  Third,  even  If  we 
found  that  two  or  more  test  or  scale  scores  were  reasonably  highly 
correlated  and  had  similar  patterns  of  factor  loadings,  we  grouped  them  into 
the  same  composite  only  If  we  expected  that  they  would  have  similar  patterns 
of  correlations  with  our  Job  performance  constructs.  Expert  judgments  of 
expected  predictor-criterion  relationships  were  available  to  direct  us  in 
this  task  (Wing,  Peterson  4  Hoffman,  1984). 

Figure  1  shows  how  the  nine  ASVAB  subtests  were  combined  Into  four 
composite  scores.  The  four  composites  were  Technical,  Quantitative,  Verbal, 
and  Speed.  In  computing  the  Technical  composite  score,  the  Electronics 
Information  subtest  received  a  weight  of  one-half,  while  the  Mechanical 
Comprehension  and  Auto  Shop  subtests  received  unit  weights.  The  weight  for 
the  Electronics  Information  subtest  was  only  one-half  because  a  factor 
analysis  Indicated  that  the  loading  of  the  Electronics  Infonutlon  on  the 
Technical  factor  of  the  ASVAB  was  only  about  one-half  as  large  as  the 
loading  of  the  Mechanical  Comprehension  and  Auto  Shop  subtests. 

As  noted  above,  the  six  spatial  tests  were  all  highly  Intercorrelated. 
Therefore,  as  Figure  2  shows,  these  six  tests  were  combined  into  a  single 
composite  score. 

Six  composite  scores  were  computed  from  the  20  perceptual -psychomotor 
test  scores  from  the  computer  battery.  These  six  composites  were 
Psychomotor,  Complex  Perceptual  Speed,  Complex  Perceptual  Accuracy,  Number 
Speed  and  Accuracy,  Simple  Reaction  Speed,  and  Simple  Reaction  Accuracy. 
Figure  3  shows  how  the  20  test  scores  were  combined  into  these  six 
composites. 


317 


Naehanletl  CoaprilMnslon 


Figure  1.  Fomatlon  of  general  cognitive  ability  conposites  from  ASVA6 
subtests . 


Ofeifct  totitiofi 


Orltntatlon 

Figurti  Mtasonlng 


flQura  tl  Pofvatlon  of  tpttlol  iblllty  eoivoslto  from  spitlol  bottory  tobt 
scorts. 


Figure  3.  Formation  of  perceptual • psychomotor  ability  composites  from 
computer  battery  test  scores. 


320 


Four  tonporamont/ptrsonallty  composltos  wort  conputtd  fro*  the  ABLE 
scales  (see  Figure  4).  The  composites  Included  Achievement  Orientation, 
Oopendablllty,  Adjustment,  and  Physical  Condition.  Four  of  the  11  ABLE 
scales  were  not  Included  In  any  composite. 

Figure  5  shows  that  six  vocational  Interest  composites  were  computed 
from  the  22  AVOICE  scales.  These  composites  were  Skilled  Technical, 
Structural/Machines,  Combat-Related,  Audiovisual  Arts,  Food  Service,  and 
Protective  Services. 

Finally,  the  six  scales  of  the  JOB  were  combined  Into  three  composites 
(see  Figure  6).  These  were  Organizational  and  Co-Worker  Support,  Routine 
Work,  and  Job  Autonomy. 


Relationships  between  Predictor  Domains  and  Job  Performance  Constructs 


Job  Performance  Constructs 

The  performance  criteria  used  for  this  study  were  the  five  Job 
performance  constructs  described  In  the  Campbell  et  al.  (1987)  paper.  Table 
3  provides  complete  definitions  of  these  five  constructs.  The  first 
construct.  Core  Technical  Proficiency,  refers  to  a  soldier's  performance  on 
those  tasks  that  are  central  to  the  soldier's  Job.  The  second  construct. 
General  Soldiering  Proficiency,  represents  a  soldier's  performance  on  tasks 
that  are  required  of  all  soldiers#  regardless  of  their  assigned  Job.  These 
first  two  constructs  represent  the  "can  do"  portion  of  the  Job  performance 
space.  The  third  performance  construct  Is  Effort  and  Leadership.  This 
construct  reflects  the  degree  to  which  a  soldier  tries  hard  on  the  Job,  even 
under  adverse  or  hazardous  conditions,  and  provides  support  and 
encouragement  for  peers.  The  fourth  construct.  Personal  Discipline, 
represents  the  degree  to  which  a  soldier  follows  Army  rules  and  regulations, 
maintains  high  standards  of  personal  conduct,  and  avoids  disciplinary 
problems.  The  fifth  construct.  Physical  Fitness  and  Military  Bearing, 
represents  the  degree  to  which  a  soldier  maintains  an  appropriate  military 
appearance  and  bearing  and  stays  In  good  physical  condition.  These  final 
three  performance  constructs  —  Effort  and  Leadership,  Personal  Discipline, 
and  Physical  Fitness  and  Military  Bearing  —  represent  the  "will  do”  portion 
of  the  Job  performance  space,  though  Effort  and  Leadership  also  Includes 
some  elements  of  "can  do"  performance. 


Hypothesized  Relationships  between  Predictor  Domains  and  Job  Performance 

Cflnstnicts 

Figure  7  depicts  the  expected  relationships  between  the  predictor 
domains  and  the  five  Job  performance  constructs.  From  the  cognitive  portion 
of  the  predictor  space,  four  ASVAB  composite  scores  were  available  for 
general  cognitive  ability,  a  spatial  battery  composite  score  was  available 
for  spatial  ability,  and  six  computer  battery  composite  scores  were 
available  for  perceptual -psychomotor  ability.  It  was  hypothesized  that 
these  cognitive  predictor  composite  scores  would  be  useful  for  predicting 
scores  on  the  two  "can  do*  performance  constructs.  Core  Technical 


321 


Salf-EstMH 


Uork  Orltntation 


Ach1«VMwnt  OrlMTtation 


Energy  Uv«1 


Consol ontl ousnsss 


Non-0«1 Inqusney 


D«p«ndab1111ty 


Etntional  Stability 


Adjustaant 


Physical  Condition 


Physical  Condition 


Fioura  4.  Formation  of  tamperamant/porsonallty  composites  from  ABLE  scale 
scores.  Four  ABLE  scales  were  not  used  In  computing  composite  scores. 
These  were  Dominance,  Traditional  Values,  Cooperativeness,  and  Internal 
Control . 


322 


Cl  «r1  cal  /Adml  ni  strati  ve 
Medical  Services 
Leadershi p/6ui dance 
Science/Chenical 
Data  Processing 
Mathematics 

Electronic  Comnuni cations 


Mechanics 

Heavy  Construction 
Electronics 

Vehicle/Equipment  Operator 


Structural /Machi nes 


Combat 

Rugged  Individualism 
Firearms  Enthusiast 


Combat-Rel ated 


Drafting 

Audiographics 

Aesthetics 


Audiovisual  Arts 


Food  Service  Professional 
Food  Service  Employee 


Food  Service 


Law  Enforcement 
Fire  Protection 


5.  Fomation  of  vocational  interest  composites  from  AVCICE  scale 

scores. 


Figure  6.  Formation  of  Job  reward  preference  composites  from  JOB  scale 
scores . 


324 


Tabic  3 

Definitions  of  the  Job  Performance  Constructs 


Core  Technical  Proficiency 

This  performance  construct  represents  the  proficiency  with  which  the  soldier 
performs  the  tasks  that  are  'central*  to  the  Job.  The  tasks  represent  the 
core  of  the  job  and  they  are  the  primary  definers  of  the  job.  For  example, 
the  first  tour  Armor  Crewman  starts  and  stops  the  tank  engines;  prepares  the 
loader’s  station;  loads  and  unloads  the  main  gun;  bores Ights  the  M60A3; 
engages  targets  with  the  main  gun;  and  performs  misfire  procedures.  This 
performance  construct  does  not  Include  the  Individual’s  willingness  to 
perform  the  task  or  the  degree  to  which  the  Individual  can  coordinate 
efforts  with  others.  It  refers  to  how  well  the  Individual  can  execute  the 
core  technical  tasks  the  job  requires*  given  a  willingness  to  do  so. 


general  Soldiering  Proficiency 

In  addition  to  the  core  technical  content  specific  to  a  job,  Individuals  in 
every  job  also  are  responsible  for  being  able  to  perform  a  variety  of 
general  soldiering  tasks  (e.g.,  determines  grid  coordinates  on  military 
maps;  puts  on,  wears  and  removes  M17  series  protective  mask  with  hood; 
determines  a  magnetic  azimuth  using  a  compass;  collects/reports 
Information  «  SALUTE;  and  recognizes  and  Identifies  friendly  and  threat 
aircraft).  Performance  on  this  construct  represents  overall  proficiency  on 
these  general  soldiering  tasks.  Again,  It  refers  to  how  well  the  individual 
can  execute  general  soldering  tasks,  given  a  willingness  to  do  so. 


Effort  and  Leadership 

This  performance  construct  reflects  the  degree  to  which  the  Individual 
exerts  effort  over  the  full  range  of  job  tasks,  perseveres  under  adverse  or 
dangerous  conditions,  and  demonstrates  leadership  and  support  toward  peers. 
That  Is,  can  the  individual  be  counted  on  to  carry  out  assigned  tasks,  even 
under  adverse  conditions,  to  exercise  good  judgment,  and  to  be  generally 
dependable  and  proficient?  While  appropriate  knowledges  and  skills  are 
necessary  for  successful  performance,  this  construct  Is  only  meant  to 
reflect  the  Individual’s  willingness  to  do  the  job  required  and  to  be 
cooperative  and  suportive  with  other  soldiers. 


Personal  Disci nline 

This  performance  construct  reflects  the  degree  to  which  the  individual 
adheres  to  Army  regulations  and  traditions,  exercises  personal  self-control, 
demonstrates  integrity  in  day-to-day  behavior,  and  does  not  create 
disciplinary  problems.  People  who  rank  high  on  this  construct  show  a 
conmitment  to  high  standards  of  personal  conduct. 


g-hysical  Fftness  and  Military  Bearing 

This  performance  construct  represents  the  degree  to  which  the  individual 
maintains  an  appropriate  military  appearance  and  bearing  and  stays  in  good 
physical  condition. 


325 


Figure  7.  Hypothesized  predictor-criterion  relationships. 


326 


Proficitncy  and  Ganeral  Solditring  Proficitncy.  It  ms  hypothtsizad  that 
tht  cognitiva  prtdictor  composita  scoras  also  would  ba  usaful  for  predicting 
scores  on  Effort  and  Leadership,  since  Effort  and  Leadership  also  contained 
some  components  of  *can  do*  perfomance. 

The  four  ABLE  temperament/personallty  composite  scores,  the  six  AVOICE 
vocational  Interest  composite  scores,  and  the  three  Job  reward  preference 
composite  scores  from  the  JOB  all  were  Intended  to  serve  as  measures  of  the 
non>cogn1t1ve  portion  of  the  predictor  space.  It  was  hypothesized  that 
these  predictor  composites  would  be  most  useful  for  predicting  the  *wil1  do" 
Job  performance  constructs.  Including  Effort  and  Leadership,  Personal 
Discipline,  and  Physical  Fitness  and  Military  Bearing. 


Assessing  the  Relationships  between  Predictor  Domains  and  Job  Performance 

ConstryctA, 

Statistical  procedures.  To  assess  the  relationships  between  predictor 
domains  and  Job  performance  constructs,  multiple  linear  regression  was  used 
to  determine  the  multiple  correlation  £  of  the  predictor  composites  within 
each  domain  with  each  of  the  five  Job  performance  constructs.  This  was  one 
separately  for  each  of  the  nine  Jobs.  Each  £  was  corrected  for  range 
restriction  and  adjusted  for  shrinkage. 

The  procedure  used  to  correct  £  for  range  restriction  is  one  described 
In  Lord  and  Novick  (1968).  The  procedure  adjusts  the  Intercorrelations 
among  the  ASVAB  subtests  so  that  they  match  the  Intercorrelations  obtained 
In  a  1980  youth  population  (Mitchell  &  Hanser,  1984).  The  correlations 
among  the  predictor  composite  scores  and  the  performance  construct  scores 
are  then  adjusted  according  to  their  correlations  with  the  ASVAB  subtests. 
This  means  that  the  correction  procedure  takes  Into  account  any  range 
restriction  related  to  the  abilities  measured  In  the  ASVAB.  However,  it 
falls  to  consider  factors  that  may  reduce  the  range  of  predictor  scores  that 
are  unrelated  to  the  abilities  tapped  by  the  ASVAB. 

For  example,  as  Young  et  al.  (1987)  have  described,  most  of  the 
soldiers  In  this  study  enlisted  In  the  Army  between  July  1983  and  June  1984. 
They  took  the  Project  A  predictor  and  Job  performance  tests  In  the  summer  or 
fall  of  1985,  on  average  19  months  after  they  had  reported  for  duty.  There 
were  many  soldiers  who  enlisted  In  the  Army  at  the  same  time  as  these 
soldiers  who  would  have  been  eligible  for  our  sample,  but  who  left  the  Army 
as  a  result  of  disciplinary  problems.  In  many  Instances,  these  problems 
were  unrelated  to  any  of  the  abilities  tapped  by  the  ASVAB.  However,  the 
problems  might  have  been  related  to  some  of  the  temperaments  and  personal  ity 
traits  measured  In  the  ABLE;  indeed,  several  of  the  ABLE  scales  were 
designed  to  measure  temperaments  and  traits  associated  with  disciplinary 
problems.  The  attrition  of  these  soldiers  means  that  the  variance  of  the 
temperament/personallty  scores  In  our  soldier  sample  Is  less  than  the 
variance  that  we  would  expect  to  obtain  In  an  unselected  sample  of  18-,  19-, 
and  20>year  olds.  Unfortunately,  without  data  from  an  unselected  sample,  it 
Is  Impossible  to  know  the  extent  of  this  range  restriction,  or  to  correct 
our  validity  coefficients  for  such  range  restriction. 

This  means  that  many  of  the  validity  coefficients  reported  in  the 
following  tables  are  underestimates  of  the  true  validities  that  would  be 
obtained  In  an  unselected  sample.  The  problem  is  probably  not  very  serious 


327 


for  the  spatial  ability  cooposlte  or  for  the  six  perceptual -psychomotor 
ability  composites,  which  are  reasonably  highly  correlated  with  scores  on 
the  ASVA6.  Much  of  the  range  restriction  In  these  composites  Is  probably 
alleviated  by  correcting  for  range  restriction  In  the  ASVA6.  However,  the 
problem  Is  more  serious  for  the  composites  from  the  three  non-cogn1t1ve 
predictor  domains.  These  composites  tend  to  be  relatively  uncorrelated  with 
A^AB  scores.  Moreover,  especially  in  the  case  of  the  temperament/ 
personality  composites  from  the  ABLE,  there  Is  reason  to  believe  that  there 
Is  a  significant  amount  of  range  restriction  unrelated  to  the  abilities 
tapped  by  the  ASVAB.  The  validities  reported  for  these  predictor  domains  •• 
and  especially  for  the  ABLE  —  are  likely  to  be  underestimates  of  the  true 
validities. 

The  procedure  used  to  adjust  fi  for  shrinkage  was  developed  by  Claudy 
(1978).  The  adjustment  Is  Intended  to  yield  an  estimate  of  R  that  Is  equal 
to  the  expected  value  of  the  multiple  correlation  between  the  predictor 
scores  and  the  criterion  In  the  population  from  which  the  sample  was  drawn. 

Relationships.  Given  six  predictor  domains  and  five  job  performance 
constructs,  there  were  30  multiple  correlations  generated  for  each  of  the 
nine  jobs.  (The  one  exception  was  Infantryman,  which  was  not  scored  on  one 
of  the  performance  constructs.  General  Soldiering  Proficiency.  For 
Infantryman,  only  24  validity  coefficients  were  computed.)  These  £s  were 
averaged  across  the  nine  jobs  to  obtain  the  mean  validity  for  each  predictor 
domain  by  performance  construct  combination. 

The  30  mean  Rs  are  reported  In  Table  4.  The  table  shows  that  the 
hypothesized  predlctor-crlterlon  relationships  (presented  In  Figure  7)  were, 
by  and  large,  confirmed. 

The  general  cognitive  ability  composites,  computed  from  the  ASVAB,  were 
the  best  (ivedlctors  of  Core  Technical  Proficiency  (mean  R  •  .63)  and  General 
Soldiering  Proficiency  (mean  R  •  .65).  These  validity  coefficients  are 
extraordinarily  high,  especially  when  one  considers  that  the  ASVAB  was 
administered  to  these  subjects  on  average  two  years  prior  to  the  collection 
of  job  performance  data.  The  spatial  ability  composite  and  the  perceptual- 
psychomotor  ability  composites  also  provided  excellent  prediction  of  Core 
Technical  Proficiency  and  General  Soldiering  Proficiency. 

The  general  cognitive  ability  cooq)os1tes  also  provided  reasonable 
prediction  of  Effort  and  Leadership  (mean  R  *  .31),  as  we  had  hypothesized 
It  would.  The  mean  R  with  Effort  and  Leadership  was  only  slightly  lower  for 
the  composite  scores  from  the  other  two  cognitive  domains,  spatial  ability 
(mean  R  >  *.2S)  and  perceptual • psychomotor  ability  (mean  R  ■  .26). 

However,  the  composites  within  the  three  cognitive  domains  did  not 
predict  performance  on  Personal  Discipline  or  Physical  Fitness  and  Military 
Bearing  very  well.  None  of  the  six  mean  multiple  correlations  between  these 
three  predictor  domains  and  two  performance  constructs  exceeded  .20. 

The  best  prediction  of  Effort  and  Leadership,  Personal  Discipline,  and 
Physical  Fitness  and  Military  Bearing  was  provided  by  the  temperament/ 
personality  composites  from  the  ABLE.  The  mean  R  for  Effort  and  Leadership 
was  .33.  The  ABLE  composite  that  contributed  most  to  this  correlation  was 
Achievement  Orientation.  For  Personal  Discipline,  the  mean  R  was  .32,  with 
the  ABLE  Dependability  composite  making  the  largest  contribution  to  this  R. 


328 


Table  4 

t 

Mean  Validity”  for  the  Coepoeite  Scores  within  Each  Predictor  Doealn  across  Nine  Aniv  Enlisted  Jol 


jQ  S  e  n 

• 

o 

94 

e  «  w  a 

M 

rt 

04 

04 

94 

»  0  atf 

e 

e 

• 

• 

• 

e  ^ 

PS  e 
u 

0. 

e  m 
e  <P 

0  s  ^ 

in 

4r 

n 

b  a 

tn 

N 

94 

a  e  K 

e 

• 

• 

• 

• 

e 

>  M 

I^>* 

e 

e-N 

e  e  « 

« 

M 

n 

M 

e 

e  a 

N 

M 

n 

n 

■ 

sox 

• 

• 

• 

• 

e 

s 

b 

0 

*i 

0 

•0* 

tJ 

s 

fiU 

a  a  ^ 

■  u 

S  S. 

1  h 
^  0 

0  >* 

9  0^^ 

rt 

r* 

le 

M 

94 

&  0  ^  a 

in 

m 

M 

e  r  •*«  as 

• 

• 

• 

• 

# 

0  0  A  ^ 
W  X 

££ 

*H  > 

« 

m 

»«• 

O 

44-4  a 

m 

M 

8.2  £ 

• 

• 

• 

. 

e 

CO  < 

e 

-4  >  M 
00^  *i  0^ 
b  44  -4  ^ 

rt 

m 

-4 

10 

o 

•  -4,n  a 

n 

iM 

rc 

e  e-4 

• 

• 

. 

. 

0 

e  9>.a 

«s< 

e 

0  44 

0 

0 

-U  >t 

e  0 

>< 
9  0 

•q  a 

e 

? 

e 

a  9 
a  c 
a  —4 
e  u 

44  0 

a  S 

e  0 

S  3 

£  e 

0  « 

e  c 

-4  « 

e  -4 

e  £ 

e 

—4  ^ 

-4  >1 

i3  i  h 

e 

—1  W  -4 

a 

0  -4 

a  to 

0  C  44 

E*  0 

«  «  0 

44  b 

e  a 

O  0 

0  e 

•04 

1*  ^ 

h  O 

0-4 

-4  4.1 

44  e 

0  «4 

e  <9  44 

0  <0 

0  0 

a  -0 

u  p 

b  0 

C-4  0 

44  0 

h  e 

>»-o 

m  o 

0  u 

e  0  h 

%t  0 

0 

£  -4 

flu 

O  flu 

O  CB  cu 

aa  iJ 

a  o 

a  X 

e  jQ 


329 


Validity  coefficients  were  corrected  for  range  restriction  and  adjusted  for  shrinkage. 
'K  is  the  nuMber  of  predictor  coeposites. 


Finally,  tht  ABLE  coapositts  comlattd  .37  on  avtragt  with  Physical  Fitness 
and  Military  Bearing.  The  key  predictor  of  this  perfomance  construct  was 
the  ABLE  Physical  Condition  coaposite. 

On  the  other  hand,  the  teaperaaent/personallty  doaain  provided  worse 
prediction  of  the  two  "can  do*  perforaance  criteria  than  any  of  the  other 
five  predictor  doaalns.  The  aean  £  for  Core  Technical  Proficiency  was 
only  .26,  while  the  aean  £  for  General  Soldiering  Proficiency  was  .25. 

The  relationships  between  the  vocational  Interest  coaposites  and  the 
job  perforaance  constructs  were  soaewhat  different  than  expected.  For  the 
Interest  coaposites,  the  pattern  of  correlations  across  the  five  Job 
perforaance  constructs  was  aore  like  the  pattern  for  the  cognitive  predictor 
doaalns  than  the  pattern  for  the  teaperaaent/personallty  doaain.  The 
highest  aean  £s  were  with  Core  Technical  Proficiency  (aean  £  ■  .35)  and 
General  Soldiering  Proficiency  (aean  £  -  .34).  The  lowest  aean  £s  involved 
prediction  of  Personal  Discipline  (aean  £  -  .13)  and  Physical  Fitness  and 
Military  Bearing  (aean  £  -  .12).  The  aean  validity  for  Effort  and 
Leadership  was  .24. 

The  pattern  of  correlations  for  the  job  reward  preference  coaposites 
was  slallar  to  that  for  the  vocational  interest  coaposites. 

As  a  further  test  of  the  hypothesized  predictor-criterion  relationships 
presented  In  Figure  7,  the  predictor  coaposites  were  grouped  Into  two  sets. 
The  11  general  cognitive  ability,  spatial  ability,  and  perceptual - 
psychoaotor  ability  coaposites  were  grouped  into  a  set  of  cognitive 
coaposites.  The  13  teaperaaent/personallty,  vocational  Interest,  and  job 
reward  preference  coaposites  were  grouped  Into  a  set  of  non-cognltive 
coaposites.  For  each  set  the  £  was  coaputed  with  each  of  the  five  job 
perforaance  constructs  within  each  of  the  nine  jobs.  Mean  £s  froa  these 
analyses  are  presented  In  Table  5. 

The  pattern  of  correlations  Is  very  siailar  to  that  predicted  In  Figure 
7.  The  cognitive  coaposites  provide  the  best  prediction  of  Core  Technical 
Proficiency  (aean  £  •  .65)  and  General  Soldiering  Proficiency  (aean 
£  "  .69).  The  non-cognitive  coaposites  provide  the  best  prediction  of 
Personal  Discipline  (aean  £  -  .35)  and  Physical  Fitness  and  Military  Bearing 
(aean  £  •  .38).  The  non-cogn1t1ve  coaposites  also  predict  Effort  and 
Leadership  better  than  the  cognitive  coaposites,  though  the  difference  is 
not  very  large  (aean  £s  -  .38  and  .32,  respectively). 

Table  5  also  shows  that,  when  all  24  coaposites  are  used  to  predict 
each  perforaance  construct,  the  aean  £s  are  .67  for  Core  Technical 
Proficiency,  .70  for  General  Soldiering  Proficiency,  .44  for  Effort  and 
Leadership,  .37  for  Personal  Discipline,  and  .42  for  Physical  Fitness  and 
Military  Bearing.  These  results  Indicate  that  for  at  least  two  of  the  job 
perforaance  constructs  —  Effort  and  Leadership  and  Physical  Fitness  and 
Military  Bearing  —  the  best  prediction  Is  obtained  when  both  cognitive  and 
non-cogn1t1ve  predictors  are  used. 

The  one  surprising  result  In  Table  5  Is  the  high  correlation  between 
the  non-cogn1t1ve  predictors  and  the  two  "can  do"  perforaance  constructs. 

For  both  perforaance  constructs,  the  aean  £  was  .44.  In  fact,  the  non- 
cognltive  coaposites  predicted  "can  do*  perforaance  better  than  they 
predicted  "will  do"  perforaance. 


330 


Table  5 

Mean  YfllldltV*  for  tha  CQ<initiYa«  tha  Mon-Cognltlve.  and  All  Predictor  Coeposltea  acroee  Nine 
Arev  Enlisted  Jobe 


^  rt 

e 

C*» 

M 

•H  1 

r» 

n 

<  » 

• 

a 

• 

• 

. 

a 

e 

a 

> 

a 

Q 

*» 

s. 

e  m 

a 

0 

m 

« 

0 

p  > 

rt 

rt 

r» 

u 

u  » 

• 

a 

• 

• 

• 

1 

u 

e 

• 

0 

0 

z 

0 

V 

s 

w 

flu 

>a 

•H  ^ 

# 

lA 

(A 

N 

r* 

p> 

c  1 

«e 

n 

N 

•• 

gtdfl 

« 

a 

• 

• 

• 

0 

u 

>• 

o 

e 

a 

44 

0 

0 

0 

e 

p 

a 

u 

0 

44 

0 

h 

a 

a 

flu 

.*4 

■o 

e 

«« 

X! 

a 

e 

s 

0 

b 

? 

a 

b 

e 

•b 

a 

a 

•»4 

a 

a  9 

a 

Ut 

•o 

a 

a  S 

0 

a 

a 

a  "b 

c 

a 

a 

o 

e  b 

a 

0 

•2 

a 

44  a 

a 

•H 

•b 

>b  a 

c 

e 

0 

*2 

o 

k.  OB 

0 

£ 

GO 

e 

%4 

U 

a 

b 

b 

w 

a 

a 

a  b 

a 

a 

44 

e 

o  a 

GU 

b 

b 

0 

b  44 

a 

a 

0 

a 

a  b 

A 

h 

e 

%4 

b 

^b 

0 

p 

a 

44 

a 

•fl!  b 

u 

o 

H 

flU 

S  X 

«  ^ 


331 


Validity  coefficients  were  corrected  for  range  restriction  and  adjusted  for  shrinkage. 
*K  is  the  nueber  of  predictor  coeposites. 


TIm  IncrtMntal  Validity  of  tho  Projoct  A  Prodictor  Tosts 


An  laportant  quastlon  for  tho  Aray  sponsors  of  tho  prosont  study  was 
how  to  laprovo  on  tho  validity  of  docislons  aado  using  tho  Aray’s  current 
soloctlon  and  classification  InstnaMnt.  tho  ASVAB.  To  holp  answor  that 
quostlon,  tho  validity  of  tho  gonoral  cognitivo  ability  coaposite  scores 
(coaputod  froa  tho  ASVAB)  was  coaparod  to  tho  validity  obtalnod  when  the 
coaposite  scores  froa  a  prodictor  doaaln  wore  used  to  supploaont  tho  general 
cognitive  ability  coapositos.  This  was  done  for  each  porforaance  construct 
within  each  of  tho  nine  Jobs.  Validities  wore  then  averaged  across  the  nine 
Jobs.  The  resulting  aoan  validities  are  reported  In  Table  6. 

Table  6  shows  that  none  of  tho  predictor  doaalns  added  aore  than  .02  to 
the  general  cognitive  ability  coapositos*  validity  for  predicting  Core 
Technical  Proficiency.  Slallarly,  no  predictor  doaaln  added  aore  than  .03 
to  the  general  cognitive  ability  coapositos*  validity  for  predicting  General 
Soldiering  Proficiency.  In  both  Instances,  the  predictor  coaposite  that 
added  the  greatest  Increaental  validity  was  the  spatial  ability  composite. 

Most  predictor  doaalns  also  added  very  little  to  the  prediction  of 
Effort  and  Leadership,  Personal  Discipline,  and  Physical  Fitness  and 
Military  Bearing  froa  the  general  cognitive  ability  coaposites.  The  one 
exception  was  the  teaperaaent/personallty  doaaln.  The  four  temperament/ 
personality  coaposites  added  .11  to  the  validity  for  predicting  Effort  and 
Leadership,  .19  to  the  validity  for  predicting  Personal  Discipline,  and  .21 
to  the  validity  for  predicting  Physical  Fitness  and  Military  Bearing. 

Table  7  provides  another  aeans  for  looking  at  the  Increaental  validity 
of  the  Project  A  predictor  tests.  The  table  shows  that  the  seven  new 
Project  A  cognitive  coaposites  (I.e.,  the  spatial  ability  coaposite  plus  the 
six  perceptual -psychoaotor  ability  coaposites)  predict  Job  performance 
alaost  as  well  as  the  four  general  cognitive  ability  coaposites  froa  the 
ASVAB.  For  Core  Technical  ^flclency  and  General  Soldiering  Proficiency, 
the  validity  of  the  new  Project  A  cognitive  coaposites  Is  quite  high  (mean 
£  -  .59  and  .65,  respectively).  However,  the  performance  variance  predicted 
by  the  new  Project  A  cognitive  coaposites  Is  virtually  Identical  to  the 
performance  variance  predicted  by  the  ASVAB.  The  new  Project  A  cognitive 
coaposites  Increaent  the  validity  for  Core  Technical  Proficiency  by  .02  and 
Increment  the  validity  for  General  Soldiering  Proficiency  by  .04.  (At  first 
glance,  those  results  were  disappointing  to  many  of  us  on  the  Project  A 
research  teaa.  However,  as  we  had  time  to  reflect,  we  decided  that  we  had 
established  that  the  Army  was  already  doing  a  very  good  Job  of  predicting 
*can  do”  Job  performance,  which  our  Aray  sponsors  were  pleased  to  hear. 

Also,  as  a  practical  natter,  there  simply  1sn*t  much  that  one  can  do  to 
Improve  on  a  test  with  a  validity  of  .63  or  .65  for  predicting  Job 
performance  two  years  later.) 

Table  7  also  shows  that  the  13  non-cognitive  composites  predict  Effort 
and  Leadership  (mean  £  -  .38),  Personal  Discipline  (mean  £  -  .35),  and 
Physical  Fitness  and  Military  Bearing  (mean  £  -  .38)  better  than  the  four 
general  cognitive  ability  composites  predict  these  three  Job  performance 
constructs.  When  the  ASVAB  composites  are  added  to  the  non-cognitive 


332 


Predictor 


i 

f 


a 

o 

u 

u 


i 

ki 

0 


9 

u 

Ok 


0 

« 

fid 


e 

« 

I 


O 

u 

o 

o 

ca 


3 


a 


3 


a 


1 

« 

u 


s 


o 

o 


ai 


•N  >  >• 

u 


I- 


TJ  _ 

A  W  • 

2c::|^SS£ 
5?a*-  a;- 
"  £ 


•  e  ■ 

»^  >  >*  e  -u 
a<H  a  o  e 
it  s<^  a  o 

a  **4  ^  ^  ^  it  ^ 

e  e«4  flu  e  a  i 

3|S 


>  M 


*i  >• 

a  e  •u 

^  >  >•  a  ^ 

e<tit^  fl«-t^ 

it  a  a  a  a 

a*H^u  3  it  e  I 

e  e*»t  ^  a  0  id 

a  g>jQ  eu  a  a  o- 

(9  p  <  mu 

u  a  a 

ft  flu 


I  it 

a  ^  0 
^  >  >•  a  ki 
a<«t  4J  3  o  •*<  e 

irf4<«4  a4J  ■•HfiU 

^  3  flu  0  ^  I 
e<«t ^  a  je<«t  id 

5?5«-Siia'' 

"  £S 


a 

^  >  >»  >« 
a-4  4J  a  *•  -* 
it  .ki>H  a'H^  in 

2iM  .N  3  4J  •H  I 
e<«u ^  a>^  id 

sia^s-a- 


^  >  >o 

a«4  4i 
it  « 

a«ti-«  I 
e  e«4 id 
a  g«i3 '«' 
o  o  < 
u 


a 

'’fi 

it 

2 


in 


« 


« 

« 


in 

n 


le 

«o 


M 


in 


in 

tn 


M 


M 


N 

M 


N 

N 


e 

n 


44 

0k 

a 

0 

•0*  >t 

e  0 

> 

2*  0 

•Q  a 

a 

a  9» 
a  c 
a  0k 
e  it 
44  a 
0k  a 
Qb  GO 

O 

s 

x;  e 

0  a 
a  -4 

0k 

c  e 

0k  a 
it  "H 

e  -t 
a  £ 
a 

c 

0k  0k 

a  0k 

a  it 

44 

b>  0 

a 

a  o 

44  it 

C  0. 

0  a 

a 

•H 

it 

••t 

h  a 

o-t 

44 

e 

a  «4 

a 

•0  VI 

0  *3 

a  0 

a 

p 

it  0 

e 

0k  0 

V  a 

it  a 

u 

0  it 

a 

0  it 

«t  9 

a  -H 

£  0k 

o  flu 

o 

(0  cu 

id  iJ 

flu  Q 

S  z 

1 


a  ia  u 


333 


Validity  coefficients  were  corrected  for  range  restriction  and  adjusted  for  shrinkage. 
Incremental  validity  refers  to  the  increase  in  B  afforded  by  the  new  predictors  above  and  beyond 
the  B  for  the  Army's  current  predictor  battery,  the  ASVAIl. 

K  is  the  number  of  predictor  composites. 


Table  7 

Mean  Validity****  tor  the  Prelect  A  Cognitive  and  the  Pro-iect  A  Non-Cognltiv  Pradictor  CowPOgitM 
across  Mine  Arev  Enlisted  Jobe 


a 

e 

<  ** 

a 

0  e  a 

0  >  4J 

e^>«4  a  ait*> 

a  3  B 

in 

«n 

p* 

0'>4  g  f-*  0  1 

<o 

n 

e 

U  G  0.0.  U  X 

« 

. 

. 

. 

. 

> 

®*  2*  ■  -  ^ 

•H 

So  a 

»  u  u  < 

9  > 

e 

9 

X  ^ 

0 

u 

1 

<  9 

e 

> 

0 

a 

z 

0  44  e 
e-H  +i  — . 

0 

in 

0 

a 

^  C  "H  n 

rt 

n 

e 

0  oa  ft 

• 

* 

. 

. 

. 

4J 

wool 

•H 

0.  U  0.Z 

a 

1  B 

0 

»  e  p 

a 

e  0  o 

a 

X  X 

0 

o 

a 

u 

e 

0 

<  *i 

a 

0 

a  e  a 

0  >  0  ... 

.  *0 

e*H<H  a  o.»H 

e 

a  3  a  ^ 

ifl 

Ok 

M 

n 

w 

e*^  0 ^  0  1 

«o 

kO 

n 

N 

0* 

u  e  2.0.  u  X 

• 

• 

. 

• 

. 

£  9  a 

So  oa 

e 

»  o  u  < 

. 

> 

e  > 

•H 

X  CQ 

< 

e 

t 

O" 

Cognitive  New  Project 
(ASVAB)  Cognitive 

sites  Composites 

-4)°  (K-7) 

63  .59 

65  .65 

31  .27 

16  .13 

20  .14 

• 

>0  z 

• 

• 

• 

• 

• 

^  0.'-' 
e*H  a 

W  0 

9 

e>»4  u 

C 

c  ja 

••4 

e  < 

w 

o 

- 

a  a 
a  a 

a  0 

e 

c 

0 

44  >. 

9 

•H  >. 

>» 

U 

0  *J 

C  0 

9  O 

•o  a 

e 

X  9 

e  0 

4=  e 

e  e 

C  <*4 

B 

44 

9  3 

0  o 

■H  0 

e  x: 

.H 

^  -H 

XI  a  M 

9 

U 

a 

m  ^ 

a 

0  u  .w 

0 

«  e  0 

¥  ^ 

G  a 

CJ-H 

0  a 

h  « 

0  "H 

X 

c 

e 

e  -c 

0  <3 

a  0 

a 

W  0 

U  0 

c  ^  0 

<M  « 

w  a 

>*0 

e  u 

0  u 

sow 

<M  e 

e 

£  e 

0. 

O  0. 

O  CO  0. 

H 

0.  a 

0.  a 

334 


'Validity  coefficients  were  corrected  for  range  restriction  and  adjusted  for  shrinkage, 
'increnental  validity  refers  to  the  increase  in  B  afforded  by  the  new  predictors  above  and  beyond 
the  B  for  the  Amy's  current  predictor  battery,  the  ASVAB. 

K  is  the  number  of  predictor  composites. 


conposltts,  thf  BMan  validity  for  Effort  and  Ltadtrship  Incrtasts  by  .05, 
the  awan  validity  for  Ptrsonal  Disci pilot  increases  by  .02,  and  the  validity 
for  Physical  Fitness  and  Military  Bearing  increases  by  .03. 

The  results  in  Table  7  are  consistent  with  our  hypotheses  (see  Figure 
7)  that:  (1)  cognitive  ability  composites  would  be  the  most  valid 
predictors  of  Core  Technical  Proficiency  and  General  Soldiering  Proficiency; 
(2)  non*cognit1ve  composites  would  be  the  most  valid  predictors  of  Personal 
Discipline  and  Physical  Fitness  and  Military  Bearing;  and  (3)  both  cognitive 
and  non-cognitive  predictors  would  be  useful  for  predicting  Effort  and 
Leadership. 

A  comparison  of  Tables  6  and  7  shows  that  almost  all  of  the  incremental 
validity  in  the  prediction  of  the  three  ■will  do"  performance  constructs  is 
provided  by  the  ABLE.  When  the  ABLE  composites  and  the  ASVAB  composites  are 
used  to  pr^ict  Effort  and  Leadership  the  mean  £  is  .42.  When  the  AVOICE 
composites  and  the  JOB  composites  are  added  to  the  ABLE  and  ASVAB 
composites,  the  mean  validity  Increases  only  by  .01.  Similarly,  the  AVOICE 
and  JOB  composites  add  only  .02  to  the  prediction  of  Personal  Discipline  and 
contribute  nothing  to  the  prediction  of  Physical  Fitness  and  Military 
Bearing. 


Relationships  between  Predictor  Dooains  and  "Method  Factors* 


In  their  paper,  Campbell  et  al.  (1987)  described  written  test  and 
rating  "method  factors"  that  merged  from  a  structural  analysis  of  the  job 
performance  measures.  As  Campbell  et  al.  noted,  the  term  "method  factor”  is 
probably  a  misnomer.  It  is  likely  that  these  factors  represent  important 
components  of  job  performance. 

The  written  test’  factor  reflects  a  soldier's  comprehension  of  the 
manuals.  Instructions,  and  other  materials  that  must  be  read  on  the  job. 

For  several  of  the  Jobs  that  were  studied,  excerpts  from  technical  manuals 
and  other  learning  aids  were  incorporated  into  the  written  knowledge  tests. 
It  is  likely  that  a  soldier  who  had  difficulty  reading  and  comprehending 
these  materials  during  Project  A  performance  testing  also  would  have 
difficulty  using  these  written  materials  on  the  job. 

The  rating  factor  represents  raters'  global  impressions  of  soldiers. 

It  is  similar  to  what  many  researchers  might  term  "halo  error"  (cf. 
reference,  19xx).  There  is,  however,  no  proof  that  this  rating  factor  truly 
is  error.  It  is  equally  possible  that  the  global  impression  represented  by 
the  rating  factor  is  an  important  measure  of  soldier  effectiveness.  The 
Project  A  data  base  provides  an  opportunity  to  study  the  relationships 
between  this  rating  factor  and  individual  difference  variables  from  several 
domains. 

Table  8  shows  the  multiple  correlations  between  the  predictors  within 
each  domain  and  the  two  method  factors.  The  mean  £s  for  the  written  test 
factor  are  much  greater  than  the  mean  £s  for  the  rating  factor  across  all 
six  predictor  domains. 

The  best  predictors  of  the  written  test  factor  were  the  general 


335 


a 

« 

o 

•o  c  — 

XI  b  •  <n 

•  «4  w 

Pi  • 
fc4 
Ck 


m  m 

G 

o 

«4  •  <0 

|ii£ 

Is- 


*i  > 

e  *» 

«  40  « 

»4  e  I 

io  ats 

■  «»- 

■  b 

fi  Sf. 

►  0* 


I  >4 
^  0 
48  .U 

a  q*t^ 

44  ■  •*«  40 
OiO  *-4  I 
GJS^X 
0  O  43 

>4  >•< 

£S 


^  >» 

48  44  ^ 
••4  "H  ^ 


^  >  M3 
48'>H  44  ^ 
U  44-*4  ^ 

•  r 

G  G^iA 
9  5»Xl  ^ 


9  m 
44  O 

o  a 

«  ■ 

u  o 

l4  O 

8s 

•  44 
h  O 

•  ■H 
»  *0 

« 

•  V4 

44  a 
e 

•2  o 


« 

>  38 
«’  O 


336 


cognitive  ability  composites  (mean  £  *  *62).  Across  the  nine  jobs  the  ASVAB 
verbal  composite  was  the  most  consistent  predictor  of  the  written  test 
factor.  The  spatial  ability  composite  and  «ne  perceptual <psychomotor 
ability  composites  had  mean  correlations  of  .55  and  .54,  respectively. 
Correlations  were  lower  for  the  composites  within  the  three  non>cogn1t1ve 
domains.  However,  the  mean  correlations  were  not  trivial,  ranging  from  .21 
for  the  tamperament/personallty  composites  to  .32  for  the  vocational 
Interest  composites.  This  pattern  of  correlations  contributes  additional 
evidence  that  this  factor  represents  a  soldier’s  proficiency  at  reading  job* 
related  materials. 

The  best  predictors  of  the  rating  factor  were  the  temperament/ 
personality  composites  (mean  £  «  .18).  Within  the  temperament/personality 
domain,  the  most  consistent  predictor  of  the  rating  factor  was  the  ABLE 
dependability  composite.  After  the  tMperament/personallty  composites,  the 
second  best  predictors  were  the  general  cognitive  ability  composites  (mean 
fi  ■  .15).  Mean  correlations  for  the  composites  within  the  remaining  four 
domains  all  were  less  than  .10.  This  pattern  of  correlations  suggests  that 
the  rating  factor  taps  dependability  on  the  job,  but  much  more  evidence 
would  be  needed  to  confirm  this  Interpretation. 

For  Table  9,  the  predictor  composites  again  were  grouped  Into  two  sets. 
For  the  written  test  factor,  the  mean  £s  across  the  nine  jobs  were  .64  for 
the  11  cognitive  composites,  .40  for  the  13  non*cogn1t1ve  composites, 
and  .65  across  all  24  predictor  composites.  For  the  rating  factor,  the  mean 
£s  were  .16,  .22,  and  .26,  respectively. 

The  pattern  of  correlations  for  the  rating  factor  Is  similar  to  the 
pattern  for  the  Effort  and  Leadership  performance  construct  (see  Table  5). 
This  suggests  that  the  rating  factor  obtained  In  this  study  reflects  raters’ 
global  Impressions  of  soldiers*  overall  competency  and  dependability.  That 
Is,  when  raters  were  asked  to  evaluate  a  soldier  on  a  particular  rating 
dimension,  they  considered  the  soldier’s  performance  on  that  dimension  and 
two  other  factors  as  well.  The  first  factor  was  their  general  Impression  of 
how  well  the  soldier  was  capable  of  performing  the  job.  The  second  was 
their  general  Impression  of  the  soldier’s  dependability. 

Another  method  of  studying  the  two  method  factors  Is  to  examine  how  the 
pattern  of  predictor-criterion  relationships  changes  when  the  variance 
attributable  to  the  method  factors  Is  removed  from  the  five  performance 
construct  scores.  These  results  are  presented  In  Table  10. 

The  validity  coefficients  presented  for  the  "raw"  performance  construct 
scores  in  Table  10  are  the  same  as  those  presented  in  Table  4.  To  compute 
residual  performance  construct  scores,  the  variance  attributable  to  the 
written  test  factor  was  parti aled  from  the  scores  for  Core  Technical 
Proficiency  and  General  Soldiering  Proficiency,  and  the  variance 
attributable  to  the  rating  factor  was  parti aled  from  the  scores  for  Effort 
and  Leadership,  Personal  Discipline,  and  Physical  Fitness  and  Military 
Bearing.  (Written  knowledge  tests  were  not  used  In  computing  scores  for 
Effort  and  Leadership,  Personal  Discipline,  or  Physical  Fitness  and  Military 
Bearing.  Nor  were  rating  scales  used  In  computing  scores  for  Core  Technical 
Proficiency  or  General  Soldiering  Proficiency.) 

The  table  shows  that  the  residual  scores  for  Core  Technical  Proficiency 
and  General  Soldiering  Proficiency  were  much  less  predictable  than  the  raw 


337 


Table 


j 


Mean  Validity  ^for  the  Composite  Scores  within  Each  Prealctor  Do»aln  dcross  Nine  Amv  Enlisted  Jobe 
for  .rRaw'*  and  "Residual"  Job  Perforwance  Construct  Scores 


B 

3 

<0  e  ^ 
.p  b  e  <*t 

OV 

b 

O 

b 

OV 

p* 

o 

e 

0  «  b  1 

N 

b 

b 

b 

b 

b 

b 

»  e  M 

• 

• 

e 

e 

0 

• 

a 

a 

a 

a 

C  w 

oe  • 

b 

fiU 

«  e 

e 

0  a 

•P4  e  VO 

in 

flO 

VO 

«o 

b 

b 

in 

b 

b  1 

b 

b 

b 

b 

b 

b 

b 

b 

e  0 

• 

e 

e 

e 

• 

a 

a 

a 

a 

a 

1^- 

>  M 

*5“  > 

e  *i 

e 

9 

■ 

m 

a 

e  e  « 

VO 

b 

in 

b 

b 

b 

CD 

■ 

b  e  1 

b 

b 

b 

b 

b 

b 

b 

b 

b 

b 

s 

e  0  K 
am'-' 

• 

• 

• 

• 

9 

a 

a 

a 

a 

a 

■  b 

u 

9  9 

0 

H  flu 

0 

■o 

1  b 

o 

^  0 

u 

«  .b  >* 

ou 

9  0  b  ^ 
b  ■•b  VO 

<n 

b 

VO 

OD 

b 

b 

♦ 

ao^  1 

in 

en 

in 

b 

b 

H 

«  £**u  ao 

0  0  i3  o-o 
b  ><< 

9  9 

• 

• 

• 

• 

• 

a 

a 

a 

a 

a 

flu  fiU 

>v 

e  b  ^ 

•pU  ^  b 

VO 

b 

b 

00 

in 

b 

in 

o 

b  b  a 

m 

b 

VO 

b 

pC 

b 

fl-H  ao 
OtM 

• 

• 

• 

• 

• 

• 

• 

a 

a 

a 

CO  < 

9 

b  >  >« 

•  b  b  — 
b  b  b  « 

n 

P* 

in 

01 

VO 

VO 

01 

o 

b 

Qbb  a 
e  Cb  ao 

9  5*U 

VO 

• 

• 

VO 

• 

B 

r> 

• 

• 

• 

a 

CN 

a 

b 

a 

u  p  < 

u 

piM 

9 

« 

« 

a 

a 

a 

9  b 

9 

9 

9 

9 

9 

S.VM  p 

•o 

» 

*9 

» 

■9 

» 

*9 

9 

■9 

>0  u 

H  CO 

« 

•P^ 

« 

b 

a 

■H 

a 

b 

a 

•PP 

flS 

a 

OS 

a 

as 

a 

os 

a 

flS 

a 

e 

a 

a 

a 

a 

fl£ 

flS 

flS 

OS 

as 

9 

> 

•9 

0  b 

0 

BV  O 

a 

a 

e 

e  0 

e 

c 

c 

B 

a 

e  9 

« 

« 

•b 

9 

£ 

>« 

i3  P  b 

0  b 

b  b  b 

a 

a  b 

a 

a 

b  9v 

0  b  b 

b 

0 

«  e 

0 

UJ 

b 

c  a 

C} 

a 

a  c 

0  m 

Cb 

b  b  b 

b 

a 

0  b 

a  -b  b 

<w  e 

e  £  <M 

B  <0  «M 

0 

•9 

a  0 

a 

C  b  W 

b  0 

b  0 

0 

C  b 

0 

VM  <0 

a 

b  a 

>t4J  b  a 

9  O 

0  9 

b 

O  0 

b 

%* 

C 

a  b 

b 

b  a 

Cb 

u  t*  cu 

U  CO  flu 

H 

a 

*3 

fl.  o 

£ 

Bb  X  OD  1 

e  jQ 


339 


Validity  coefficients  were  corrected  for  range  restriction  and  adjusted  for  shrinkage. 
K  is  the  number  of  prctlic;tnr  composites. 


scores.  This  was  true  across  all  six  predictor  domains.  The  decrease  In 
the  mean  was  greater  for  the  cognitive  predictor  domains  than  for  the  non> 
cognitive  predictor  domains. 

For  Effort  and  Leadership,  the  cognitive  predictor  cognitive  predicted 
the  residual  performance  construct  scores  better  than  they  predicted  the  raw 
performance  construct  scores.  For  example,  the  mean  £  of  the  general 
cognitive  ability  composites  with  the  raw  Effort  and  Leadership  score 
was  .31,  while  the  mean  £  with  the  residual  Effort  and  Leadership  score 
was  .46.  Thus,  the  mean  £  was  .15  higher  for  the  residual  score  than  for 
the  raw  score.  The  Increase  was  .16  for  the  spatial  ability  composite  (mean 
£  -  .41  for  residual  Effort  and  Leadership  and  .25  for  raw  Effort  and 
Leadership)  and  .12  for  the  perceptual -psychomotor  ability  composites  (mean 
£  •  .38  and  .26  for  residual  and  raw  Effort  and  Leadership  scores, 
respectively). 

For  the  temperament/personallty  composites,  the  results  were  the 
opposite.  The  mean  multiple  correlation  of  the  temperament/personallty 
composites  with  the  raw  Effort  and  Leadership  score  was  .33,  while  the  mean 
£  with  the  residual  score  was  .31. 

The  vocational  Interest  composites  and  the  job  reward  preference 
composites  actually  'behaved*  similarly  to  the  cognitive  ability  composites. 
For  both  predictor  domains,  the  mean  £  was  greater  for  the  residual  Effort 
and  Leadership  score  than  for  the  raw  Effort  and  Leadership  score. 

*Th1s  pattern  of  correlations  for  Effort  and  Leadership  suggests  two 
Interesting  conclusions.  First,  the  pattern  provides  additional  evidence 
that  the  vocational  Interest  composites  are  more  similar  to  cognitive 
predictors  than  to  temperament/personallty  predictors. 

Second,  the  changes  In  the  pattern  of  correlations  between  raw  and 
residual  scores  suggest  that  Effort  and  Leadership  becomes  more  like  a  "can 
do”  performance  construct  when  the  rating  method  factor  Is  parti aled  from 
the  raw  score.  The  mean  multiple  correlations  between  the  residual  Effort 
and  Leadership  score  and  the  cognitive  predictor  composites  are  very  similar 
to  the  mean  £s  between  the  two  residual  proficiency  construct  scores  and  the 
cognitive  predictor  composites.  On  the  other  hand,  the  residual  Effort  and 
Leadership  score  has  a  much  higher  correlation  with  the  temperament/ 
personality  composites  than  the  two  residual  proficiency  construct  scores 
have  (mean  £  •  .31  for  Effort  and  Leadership,  .22  for  Core  Technical 
Proficiency,  and  .21  for  General  Soldiering  Proficiency).  This  Indicates 
that,  even  after  the  rating  factor  Is  parti aled  from  the  raw  Effort  and 
Leadership  score,  the  residual  Effort  and  Leadership  score  continues  to 
reflect  the  "will  do"  portion  of  the  job  performance  space.  Thus,  the 
residual  Effort  and  Leadership  score  appears  to  tap  both  "can  do"  or  maximal 
Job  performance  and  "will  do*  or  typical  job  performance. 

Partial Ing  the  rating  factor  from  the  Personal  Discipline  and  Physical 
Fitness  and  Military  Bearing  scores  had  little  Impact  on  the  correlations  of 
these  scores  with  the  predictor  composites.  None  of  the  correlations  for 
these  two  performance  constructs  changed  by  more  than  .04  when  residual 
scores  were  used  Instead  of  raw  scores. 


340 


SuBMry  and  Conclusions 


The  pattern  of  predictor-criterion  relationships  presented  in  this 
paper  was  consistent  with  the  pattern  that  was  expected.  Cognitive 
predictors  provided  excellent  prediction  of  Core  Technical  Proficiency  and 
General  Soldiering  Proficiency.  Across  nine  very  different  Jobs,  the  mean  R 
for  the  complete  set  of  11  cognitive  composite  scores  was  .65  for  Core 
Technical  Proficiency  and  .69  for  General  Soldiering  Proficiency.  Clearly, 
cognitive  predictors  provide  excellent  prediction  of  Job  proficiency  for 
Army  enlistees.  Non-cogn1t1ve  predictors  —  specifically,  temperament/ 
personality  predictors  —  were  the  best  predictors  of  Personal  Discipline 
and  Physical  Fitness  and  Military  Bearing.  The  best  prediction  of  Effort 
and  Leadership  was  obtained  when  both  cognitive  and  non-cogn1t1ve  predictors 
were  used. 

The  predictor-criterion  relationships  uncovered  enhanced  understanding 
of  both  the  predictor  space  and  the  Job  performance  space.  On  the  predictor 
side,  the  vocational  Interest  composites  provided  surprisingly  good 
prediction  of  Core  Technical  Proficiency  and  General  Soldiering  Proficiency. 
In  retrospect,  these  correlations  often  made  perfectly  good  sense.  For 
example,  as  Wise,  Campbell,  and  Peterson  (1987)  note,  the  combat -related 
Interest  composite  was  correlated  with  scores  on  General  Soldiering 
Proficiency,  which  represents  performance  on  common  soldiering  tasks.  The 
combat -related  Interest  composite  also  was  correlated  with  Core  Technical 
Proficiency  scores  In  the  the  three  combat  Jobs  studied  (Infantryman,  Cannon 
Crewmember,  and  Armor  Crewman).  In  retrospect,  these  correlations  often 
made  perfectly  good  sense  —  as  research  results  often  do.  In  retrospect. 

In  this  case,  the  results  suggest  that  people  who  are  Interested  In  their 
work  are  more  likely  to  perform  well  on  their  Job  than  people  who  are  not 
Interested  in  their  work.  This  certainly  Is  not  surprising.  In  retrospect. 

On  the  criterion  side,  the  pattern  of  predictor-criterion  correlations 
helped  add  to  our  confidence  In  the  construct  validity  of  the  Job 
performance  scores.  The  pattern  of  correlations  also  enhanced  understanding 
of  the  Effort  and  Leadership  construct,  the  written  test  and  rating  method 
factors,  and  the  relationship  between  raw  and  residual  performance  construct 
scores. 

The  correlations  of  the  vocational  Interest  and  Job  reward  preference 
composites  with  the  "will  do"  performance  criteria  point  to  one  weakness  of 
the  Project  A  criterion  measures.  The  best  criteria  for  these  predictors 
would  be  some  measure  of  Job  satisfaction.  In  future  Project  A  validation 
research,  we  will  include  Job  satisfaction  measures  in  our  assessment. 


References 


Campbell,  J.  P.,  Harris,  J.  H.,  McHenry,  J.  J.,  6  Arabian,  J.  (1987,  April). 
Analysis  of  criterion  measures:  The  modeling  of  performance.  Paper 
presented  at  the  Second  Annual  Conference  of  the  Society  for  Industrial 
and  Organizational  Psychology,  Atlanta,  GA. 

Claudy,  J.  G.  (1978).  Multiple  regression  and  validity  estimation  in  one 


341 


>  2»  4,  S9S-601. 


Lord*  P^  tllovlck,IJ.  (1968).  Statistical  thtorv  of  aental  tmit  «cm 
teadliig,  HA:  Add1son«W«s1«y  Publishing  Coapany,  Inc. 

Mitchell,  K.  J. ,  Hanser,  L.  M. ,  &  Grafton,  F.  C.  (1984).  The 


vou-cn  popuxa-cion  norma 

classification  standards  in  the  Amv  (ARl-RS-in>-84>13) . 
Alexandria,  VA:  U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences. 


I* 


III*  I*}  ft-Pl 


Ptttrson,  M.  C.,  Houoh,  L,  H.,  OuiUMttt,  M.  0..  Rosse.  R.  A..  Houston, 
J.  S.,  Toquu,  J.  L.,  I  Ulfig,  H.  (1987,  April). 


•  s  R  <Tarrr7n  4WT7r*  rrn  rK^TTTHsTiTTBTTn? 


tRlti.  Paper  presontod  at  tha  Second  Annual  Conference  of  the  Society 
for  Industrial  and  Organizational  Psychology,  Atlanta,  GA. 

Young,  Y.  Y.,  Itarris,  J.  H.,  Hoffiaan,  8.  R.,  A  Houston,  J.  S.  (1987,  April). 

scale_data  collection  and  data  base  preparation.  Paper  presented 
at  the  Second  Annual  Conference  of  the  Society  for  Industrial  and 
Organizational  Psychology,  Atlanta,  6A. 

Hing,  H.,  Peterson,  N.  6.,  A  HoffMan,  R.  6.  (1984,  August).  Ernert 
ilriaURn^  of  DCtdlCtor-crlterlon  validity  relationships.  Paper 
presented  at  the  92nd  Annual  Convention  of  the  Aaerlcan  Psychological 
Association,  Toronto,  Canada. 

Wise,  L.  L.,  Ca^ibell,  J.  C.,  A  Peterson,  N.  6.  (1987,  April).  Identifying 


,  ^  .  •  Paper  presented  at  the  Second  Annual 

Conference  of  the  Society  for  Industrial  and  Oi^anlzatlonal  Psychology, 


342 


IDENTIFICATION  OF  PREDICTOR  CONSTRUCTS  AND 
DEVELOPMENT  OF  NEW  SELECTION/CLASSIFICATION  TESTS 


Norman  6.  Peterson 
Leaetta  M.  Hough 
Marvin  D.  Dunnette 
Rodney  L.  Rosse 
Janis  S.  Houston 
Jody  L.  Toquan 

Personnel  Decisions  Research  Institute 


Hilda  Wing 

U.S.  Army  Research  Institute 


Presented  at  the  Annual  Conference  of  the 
Society  for  Industrial  and  Organizational  Psychology 

Atlanta,  Georgia 

April  1987 


The  views  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


343 


The  distinguished,  earlier  speakers  have  elegantly  and  accurately 
described  the  scope  and  purpose  of  Project  A.  Before  plunging  Into  the  main 
topic  of  this  paper,  however,  we  would  like  to  present  a  few  figures  that 
show  the  real  questions  that  had  to  be  addressed  by  Project  A. 

Figure  1  shows,  at  first  blush,  what  appeared  to  be  the  major  question 
to  be  answered.  Well,  I  have  to  admit,  this  question  had  some  of  us  a  little 
bit  worried. 

Imagine  our  relief  then,  when  we  discovered  that  the  real  question  to  be 
answered  was  the  one  shown  In  Figure  2. 

Finally,  Figure  3  shows  the  question  posed  for  the  predictor  team  of 
Project  A.  Well,  by  now  we  were  down  to  a  question  that  any  reality-grounded 
psychologist  could  really  be  afraid  of  tackling. 

Anyway,  that  was  where  we  began.  In  the  remainder  of  this  paper,  we 
present  an  overview  of  the  process  followed  to  address  the  question  In  Figure 
3,  and  the  battery  of  tests  developed  through  that  process. 


APPROACH  AND  RESEARCH  DESIGN 


Theoretical  Approach 


At  present,  the  U.S.  Army  has  a  large  number  of  jobs  (called  Military 
Occupational  Specialities  or  MOS)  and  hires,  almost  exclusively. 

Inexperienced  and  untrained  persons  to  fill  those  jobs.  As  obvious  as  these 
facts  are,  they  need  to  be  stated  because  they  are  the  overriding  facts  that 
have  to  be  addressed  by  the  predictor  team  on  Project  A. 

One  Implication  of  these  facts  Is  that  a  highly  varied  set  of  Individual 
differences'  variables  must  be  put  Into  use  If  there  Is  to  be  a  reasonable 
chance  of  Improving  the  present  level  of  accuracy  of  predicting  training 
performance,  job  performance,  and  attrition/ retent Ion  In  a  substantial 
proportion.  If  not  all,  of  those  jobs.  Much  less  evident  Is  the  particular 
content  of  that  set  of  Individual  differences  variables,  and  the  way  the  set 
should  be  developed  and  organized. 

A  second,  and  perhaps  less  obvious.  Implication  Is  the  notion  that  new 
predictor  measures  must  be  appropriate  for  selecting  persons  who  do  not  have 
the  training  and  experience  to  begin  Immediately  performing  their  assigned 
jobs.  This  Is  true  partly  because  of  the  vast  numbers  of  job  positions  that 
need  to  be  filled,  partly  because  of  the  kinds  of  jobs  found  In  the  Army 
(Infantry,  Artillery,  etc.),  and  partly  because  of  the  population  of  persons 
that  the  Army  draws  from  (young  high-school  graduates  with  little  or  no 
specialized  training  and  job  experience). 

These  considerations  led  us  to  adopt  a  construct-oriented  strategy  of 
predictor  development,  but  with  a  healthy  leavening  from  the  content-oriented 
strategy.  Essentially,  we  endeavored  to  build  up  a  model  of  predictor  space 
by  (1)  Identifying  the  major,  relatively  Independent  domains  or  types  of 


344 


0) 

Z 


345 


FIgur*  1.  PTo|«ct  A:  QuMHon  To  bo  Anoworod 


346 


FIgiiiw  2.  Pio|«ct  A:  ChiMlIoii  To  bo  Antwortd 


is  iS  >> 

3  H  S 
^  O)  3 
CO  c  O 
C»  U 
O  U  < 

0:5  ® 
si- 

o  CL  0) 

.2  w  <0 

-u  o  9 

0)  IL  0) 
11^ 


CO  & 

g-s 

S  o> 

«  ^ 
c  s 
o  £ 

^  a 
0)  o 


••  » 
o  E 

E  c 


o 


o 


S  o> 

Is 

•5 

al 

(0  i 


Si 

«» 

c  »- 

11 


ct  n  *s 


»  01 
Ei 
®1 


347 


Individual  differences'  constructs  that  existed;  (2)  selecting  measures  of 
constructs  within  each  domain  that  met  a  number  of  psychometric  and  pragmatic 
criteria;  and,  (3)  further  selecting  those  constructs  that  appeared  to  be  the 
"best  bets"  for  Incrementing  (over  present  predictors)  the  prediction  of  the 
set  of  criteria  of  concern  (I.e.,  training/job  performance  and  attrition/ 
retention  In  Army  jobs). 

Ideally,  the  model  would,  we  hoped,  lead  to  the  selection  of  a  finite 
set  of  relatively  Independent  predictor  constructs  that  were  also  relatively 
Independent  of  present  predictors  and  maximally  related  to  the  criteria  of 
Interest.  If  these  conditions  were  met,  then  the  resulting  set  of  measures 
would  predict  all  or  most  of  the  criteria,  yet  possess  enough  heterogeneity 
to  yield  powerful,  efficient  classification  of  persons  Into  different  jobs. 

The  development  of  such  a  model  also  had  the  virtue  that  It  could  be  at 
least  partially  "tested"  at  many  points  during  the  research  effort,  and  not 
just  at  the  end,  when  all  the  predictor  and  criterion  data  are  In.  For 
example,  we  could  examine  the  covariance  of  newly  developed  measures  with  one 
another  and  with  the  present  predictors,  notably  the  Armed  Services 
Vocational  Aptitude  Battery  (ASVAB).  If  the  new  measures  were  not  relatively 
Independent  of  the  ASVAB  and  measures  from  other  domains  as  predicted  by  the 
model,  then  we  could  take  steps  to  correct  that.  Also,  by  constructing  such 
a  visible  model,  we  thought  that  modifications  and  Improvements  could  be 
Implemented  much  more  straightforwardly. 

Figure  4  shows  an  Illustrative,  construct-oriented  model  and  Is 
presented  In  order  to  represent  the  model  In  abstract.  Note  that  both  the 
criterion  and  the  predictor  space  are  depicted.  As  mentioned  earlier,  a 
great  deal  of  the  work  of  Project  A  Is  devoted  to  the  development  of 
criterion  measures,  and  we,  on  the  predictor  side,  have  taken  advantage  of 
the  Information  coming  from  those  efforts  as  It  has  become  available. 

If  this  Illustrative  model  were  to  be  developed  and  tested  with  data, 
then  the  network  of  relationships  on  the  predictor  side,  on  the  criterion 
side,  and  between  the  two  could  be  confirmed,  disconfirmed,  and/or  modified. 
It  Is  Imperative  that  the  development  of  such  models  be  done  very  carefully 
and  conservatively,  and  subjected  frequently  to  reality  testing;  we  have  kept 
this  firmly  In  mind.  However,  the  possession  of  such  a  model  enables  one  to 
state  fairly  clearly  why  such  and  such  a  predictor  Is  being  researched,  and 
to  check  quickly,  at  least  rationally,  whether  the  addition  of  a  predictor  is 
likely  to  Improve  prediction. 

Finally,  the  model  Is  depicted  as  a  matrix  with  a  hierarchical  arrange¬ 
ment  of  both  the  rows  and  the  columns.  We  have  found  It  useful  to  employ 
this  hierarchical  notion,  because  It  allows  us  to  think  In  terms  of  appro¬ 
priate  levels  of  specificity  for  a  particular  problem  as  we  do  the  research, 
or  for  future  applications  of  measures. 

Research  Objectives  -  Destinations  Along  the  Way 

This  theoretical  approach  led  to  the  delineation  of  seven  more  cont  '•ete 
objectives  of  our  research.  These  were: 

1.  Identify  measures  of  human  abilities,  attributes,  or  characteris¬ 
tics  which  are  most  likely  to  be  effective  In  predicting,  prior  to 


348 


CRITERIA 


Verbal  M*HL  M  M  LLL 

Cognitive  Numerical  M  H  .  . 

Spatial 


Precision 
Psychomotor  Coordination 
Dexterity 

Dependability 
Temperament  Dominance 
Sociability 


enotes  expected  strength  of  r^atlonshlp.  High,  Medium,  Low. 


Figure  '  4.  Illustrative  construct-oriented  model. 


entry  Into  the  organization,  successful  performance  in  general,  and 
in  classifying  persons  into  jobs  where  they  will  be  most  success¬ 
ful,  with  special  emphasis  on  attributes  not  tapped  by  current 
preinduction  measures. 

2.  Design  and  develop  new  measures  or  modify  existing  measures  of 
these  "best  bet"  predictors. 

3.  Develop  materials  and  procedures  for  efficiently  administering 
experimental  predictor  measures  in  the  field. 

4.  Estimate  and  evaluate  the  reliability  of  the  new  preinduction 
measures  and  their  vulnerability  to  motivational  set  differences, 
faking,  variances  in  administrative  settings,  and  practice  effects. 

5.  Determine  the  interrelationships  (or  covariance)  between  the  new 
preinduction  measures  and  current  preinduction  measures. 

6.  Determine  the  degree  to  which  the  validity  of  new  preinduction 
measures  generalizes  across  jobs;  that  is,  proves  useful  for  pre¬ 
dicting  measures  of  successful  performance  across  quite  different 
jobs  and,  conversely,  the  degree  to  which  the  measures  are  useful 
for  classification  or  the  differential  prediction  of  success  across 
jobs. 

7.  Determine  the  extent  to  which  new  preinduction  measures  increase 
the  accuracy  of  prediction  of  success  and  the  accuracy  of  classifi¬ 
cation  into  jobs  over  and  above  the  levels  of  accuracy  reached  by 
current  preinduction  measures. 

Research  Design  -  The  Road  Map 

To  achieve  these  objectives,  we  have  followed  the  design  depicted  in 
Figure  5. 

Several  things,  we  feel,  are  noteworthy  about  the  design.  First,  five 
test  batteries  are  mentioned;  Preliminary  Battery,  Demo  Computer  Battery, 
Pilot  Trial  Battery,  Trial  Battery,  and  Experimental  Battery.  These  appear 
successively  in  time  and  allow  us  to  modify  and  improve  our  predictors  as  we 
gather  and  analyze  data  on  each  successive  battery  or  set  of  measures. 

Second,  a  large-scale  literature  review  and  a  quantified  expert  judgment 
process  were  used  early  in  the  project  to  take  maximum  advantage  of  earlier 
research  and  accumulated  knowledge  and  expert  opinion.  The  expert  judgment 
process  was  used  to  develop  an  early  model  of  both  the  predictor  space  and 
the  criterion  space  and  relied  heavily  on  the  information  gained  from  the 
literature  review.  By  using  the  model  that  resulted  from  analyses  of  the 
experts'  judgments  of  the  relationships  between  predictor  constructs  and 
criterion  dimensions,  we  were  able  to  develop,  carefully  and  efficiently, 
measures  of  the  most  promising  predictor  constructs. 

Third,  the  design  Includes  both  predictive  (for  the  Preliminary  and 
Experimental  Batteries)  and  concurrent  (for  the  Trial  Battery)  validation 
modes  of  data  collection,  although  that  is  not  obvious  from  Figure  5.  Thus, 


350 


Figure  5.  Flow  chart  of  predictor  measure  development  activities  of 
Project  A. 


we  are  able  to  benefit  from  the  advantage  of  both  types  of  designs,  --  that 
is,  early  collection  and  analysis  of  empirical  criterion-related  validities 
in  the  case  of  the  concurrent  design,  and  less  concern  about  range  restric¬ 
tion  and  experiential  effects  in  the  predictive  design. 

Organization 

We  organized  predictor  researchers  into  three  "domain  teams"  as  we 
worked  our  way  through  this  research  design  and  toward  the  earlier  described 
research  objectives.  One  team  concerned  itself  with  the  temperament,  bio¬ 
graphical  data,  and  vocational  interest  variables  and  came  to  be  called  the 
"non-cognitive"  team.  Another  team  concerned  itself  with  cognitive  and  per¬ 
ceptual  kinds  of  variables  and  was  called  the  “cognitive”  team.  The  third 
team  concerned  itself  with  psychomotor  and  perceptual  variables  and  was 
labeled  the  “psychomotor"  team  or  sometimes  the  “computerized"  team  since  all 
the  measures  developed  by  that  team  were  computer-administered. 

Another  important  component  in  the  organization  was  the  set  of  scien¬ 
tific  advisors  assigned  to  overlook  and  assist  us,  particularly,  Lloyd 
Humphreys  and  Jay  Uhlaner.  These  scientists  met  frequently  with,  and  advised 
us  at  critical  decision  points.  The  experience  and  wisdom  they  brought  to 
the  team  were  extremely  valuable. 


INITIAL  RESEARCH  STEPS 

The  overriding  purpose  of  the  literature  review  was,  simply  put,  to  make 
maximum  use  of  earlier  research  on  the  problem  of  accurately  predicting  job 
performance  and  classifying  persons  into  jobs  in  such  a  way  that  both  the 
person  and  the  organization  receive  maximum  benefits.  More  specifically,  we 
wished  to  identify  those  variables  or  constructs,  and  their  measures,  that 
had  proven  effective  for  such  purposes.  As  Figure  5  shows,  the  information 
obtained  from  the  literature  review  was  used  in  all  the  immediately  suc¬ 
ceeding  research  activities. 

The  search  was  conducted  by  the  three  research  teams,  each  responsible 
for  a  fairly  broadly  defined  area  of  human  abilities  or  characteristics: 
cognitive  abilities;  non-cognitive  characteristics  such  as  vocational 
interests,  biographical  data,  and  measures  of  temperament;  and  psychomotor/ 
physical  abilities. 

The  literature  search  was  conducted  in  late  1982  and  early  1983.  Within 
each  of  the  three  areas,  the  teams  carried  out  essentially  the  same  steps; 

1.  Compile  an  exhaustive  list  of  possibly  relevant  reports,  articles, 
books,  or  other  sources  using  computerized  data  base  searches, 
existing  bibliographies,  and  consultation  of  experts. 

2.  Review  each  source  and  determine  its  relevancy  for  the  project  by 
examining  the  title  and  abstract  (or  other  brief  review). 

3.  Obtain  the  sources  identified  as  relevant  in  the  second  step. 


352 


4.  For  relevant  materials*  carry  out  a  thorough  review  and  transfer 
relevant  Information  onto  two  special  review  forms  developed  for 
the  project. 

Across  all  three  ability  areas*  more  than  10*000  sources  were  Identified 
via  the  computer  search.  (Of  course*  many  of  these  sources  were  Identified 
as  relevant  In  more  than  one  area*  and  were  thus  counted  more  than  once.) 

The  special  review  forms  and  the  actual  sources  that  had  been  located 
were  used  In  two  primary  ways.  First*  three  working  documents  were  written, 
one  for  each  of  the  three  areas.  (These  documents  were  put  into  research 
note  form:  Hough*  Kamp  A  Barge*  In  press;  Toquam*  Corpe*  A  Dunnette*  In 
press;  McHenry  A  Rose*  In  press.)  These  documents  Identified  and  summarized 
the  literature  with  regard  to  Issues  Important  to  the  research  being 
conducted*  the  most  appropriate  organization  or  taxonomy  of  the  constructs  in 
each  area*  and  the  validities  of  the  various  measures  for  different  types  of 
jobs  performance  criteria.  Second*  the  predictors  Identified  In  the  review 
were  subjected  to  further*  structured  scrutiny  In  order  to  (1)  select  tests 
and  Inventories  to  make  up  the  Preliminary  Battery*  and  (2)  select  the  "best 
bet"  predictor  constructs  to  be  used  In  the  expert  judgment  research 
activity. 

Expert  Judgments 

The  approach  used  In  the  expert  judgment  process  was  to  (1)  Identify 
criterion  categories*  (2)  identify  an  exhaustive  range  of  psychological 
constructs  that  may  be  potentially  valid  predictors  of  those  criterion 
categories*  and  (3)  obtain  expert  judgments  about  the  relationships  between 
the  two.  Schmidt*  Hunter*  Croll*  and  McKenzie  (1983)  showed  that  pooled 
expert  judgments*  obtained  from  experienced  personnel  psychologists,  were  as 
accurate  In  estimating  the  validity  of  tests  as  actual*  empirical  criterion- 
related  validity  research  using  samples  of  hundreds  of  subjects.  That  Is, 
experienced  personnel  psychologists  are  effective  "validity  general Izers"  for 
cognitive  tests.  They  do  tend  to  underestimate  slightly  the  true  validity  as 
obtained  from  empirical  research. 

Hence*  one  way  to  Identify  the  "best  best"  set  of  predictor  variables 
and  measures  Is  to  use  a  formal  judgment  process  employing  experts  such  as 
that  followed  by  Schmidt  et  al.*  (1983).  Descriptive  Information  about  a  set 
of  predictors  and  the  job  performance  criterion  variables  Is  given  to 
"experts"  in  personnel  selection  and  classification*  typically  personnel 
psychologists.  These  experts  estimate  the  relationships  between  predictor 
and  criterion  variables  by  rating  or  directly  estimating  the  value  of  the 
correlation  coefficients. 

The  result  Is  a  matrix  with  predictor  and  criterion  variables  as  the 
columns  and  rows*  respectively.  Cell  entries  are  experts'  estimates  of  the 
degree  of  relationship  between  the  particular  predictors  and  various 
criteria.  The  Interrater  reliability  of  the  experts'  estimates  Is  checked 
first.  If  the  estimate  Is  sufficiently  reliable  (previous  research  shows 
values  In  the  .80  to  .90  range  for  about  10  to  12  experts)*  the  matrix  of 
predictor-criterion  relationships  can  be  analyzed  and  used  In  a  variety  of 
ways.  By  correlating  the  columns  of  the  matrix*  the  covariances  of  the 


353 


predictors  can  be  estimated  on  the  basis  of  the  profiles  of  their  estimated 
relationships  with  the  criteria.  These  covariances  can  then  be  factor 
analyzed  to  Identify  predictors  that  function  similarly  In  predicting  perfor¬ 
mance  criteria.  Similarly,  the  criterion  covariances  can  be  examined  to 
Identify  clusters  of  criteria  predicted  by  a  common  set  of  predictors. 

Such  procedures  help  Identify  redundancies  and  overlap  In  the  predictor 
set.  The  common  sets  or  clusters  of  predictors  and  of  criteria  are  an 
Important  product  for  several  reasons.  Most  Importantly  here,  these 
clusters  provide  a  model  or  theory  of  predictor-criterion  performance  space. 
This  model  serves  as  an  Informative  guide  to  development  of  a  set  of  pre¬ 
dictors  that  should  be  efficient  and  valid,  at  least  Insofar  as  the  Informed 
opinion  of  knowledgeable  experts  can  propel  one  In  that  direction. 

To  carry  out  the  expert  judgment  activity,  we  had  to  Identify  predictor 
and  criterion  variables  and  prepare  materials  that  would  enable  the  experts 
to  provide  reliable  estimates  of  validity.  Time  does  not  permit  a  descrip¬ 
tion  of  these  activities.  The  predictor  team  Identified  the  predictor 
variables,  while  the  Project  A  criterion  team(s)  Identified  the  criterion 
variables. 

In  the  end,  we  had  35  experts  rate  the  validity  of  53  predictor  vari¬ 
ables  for  72  criterion  variables,,  using  materials  and  Instructions  prepared 
by  us.  Results  showed  that  the  means  of  the  predictor-criterion  validity 
judgments  (cell  means)  were  highly  reliable  (.96),  and  factor  analysis 
revealed  eight  predictor  factors  that  summarized  the  judgments  of  the 
experts.  Scrutiny  of  these  findings  resulted  In  the  hierarchical  model  shown 
In  Figure  6,- 

The  expert  judgment  task  then,  resulted  In  a  hierarchical  model  of  pre¬ 
dictor  space  that  served  as  a  guide  for  the  development  of  new,  preinduction 
measures  (the  Pilot  Trial  Battery,  See  Figure  5)  for  Army  enlisted  ranks. 
(Wing,  Peterson,  and  Hoffman,  1984,  provide  a  detailed  presentation  of  the 
expert  judgment  process  and  results.) 

Preliminary  Battery 

The  Preliminary  Battery  (PB)  was  conceived  of  as  a  set  of  proven 
"off-the  shelf"  measures  of  predictors  that  overlapped  very  little  with  the 
Army's  current  pre-induction  predictors.  The  collection  of  data  on  a  number 
of  predictors  that  represent  the  types  of  predictors  not  currently  In  use  by 
the  Army  would  allow  an  early  determination  of  the  extent  to  which  such 
predictors  contributed  unique  variance,  that  Is,  measured  attributes  not 
measured  by  current  pre-induction  predictors.  This  Information  would  be 
useful  for  guiding  the  development  of  new  predictors  Into  areas  most  likely 
to  be  useful  for  Increasing  the  accuracy  of  prediction  and  classification. 

Also,  the  collection  of  predictor  data  (from  soldiers  In  training)  early 
In  the  project  allowed  the  conduct  of  a  predictive  validity  Investigation 
much  earlier  In  the  project  than  If  we  were  to  wait  until  the  Trial  Battery 
was  developed. 


354 


CSMiauCTS 


OUSTiU 


1.  Verbal  Ceaprmwision 

S.  ItKIfnt  Caaprahmion 

M.  IdMCforal  riuaney 

IS.  Amlofical  Rmsonint 

21.  (knlbua  Intallitanec/Apcttuda 

22.  Vord  riMnqr 

A.  Varbal  Ability/ 

Oanaral  Intalllsanea 

4.  Ward  Rrablaaa 

8.  Indnetiva  Raaaanlnft  Caneape  Raraattan 

18.  Saduetiva  laflie 

0.  Raasanins 

2.  Nuaarleal  Caaputatfan 

S.  Uaa  *t  raraula/Nuitoar  Rrabiaaa 

e.  Iwbar  Ability 

OOCRITIVC 

A81LITIRS 

12.  Rareapcual  Spaad  and  Aeewraer 

a.  Oaraaptual  Spaad  and  Aaeuracy 

4V.  Inraatlflatlva  Intaraata 

e.  Inaastifativa  Intaraata 

14.  Rata  Haaary 

17.  ratlaa  Oiraetlana 

J.  INniry 

19.  Flvural  Raaaaninf 

23.  Varbal  and  rifural  Ctaaura 

r.  Ctaaura 

4.  TM»>dlaana<onal  Nantal  Rataclan 

7.  rhraa«diaanaianai  Nancai  Ratatian 

9.  Spatial  Viauallzatlan 

11.  riald  Oapandanea  IMaaatlva) 

IS.  Rlaca  Hiaary  (Vlauai  Naaary) 

20.  Spatial  Seannina 

8.  Viaualiaatiai^Spatial 

VISUALIZATION/ 

SPATIAL 

24.  Praeaaaing  Iffleianey 

23.  Salaetiva  Attantian 

24.  TiaM  Sharins 

8.  Nantal  Infamstian  Praaaasins 

INPORNATtOH 

PROCESSIIM 

IS.  Wadianical  Caaprahanaian 

48.  Raallatie  Intaraata 

81.  Artistic  Intaraata  (Nasativa) 

i.  Nacbanical  Caaprahansian 

N.  Rasitatic  vs.  Artistic 

Intaraata 

MKIARICAL 

28.  Central  Rraeiaian 

29.  Rata  Cantrdl 

32.  Arvhand  Stsadinasa 

34.  Alains 

t.  Staadinasa/Praeiaian 

27.  Nultlllab  Caardlnatien 

3S.  Spaad  al  Ara  Havaaant 

8.  Csardinatian 

PSTCNOHOTOR 

30.  Manual  Oaatarlty 

31.  Sintar  Oaxtarity 

S3.  Vrist-SInsar  Spaad 

8.  Oaxtarity 

39.  Sociability 

52.  Saaial  Intaraata 

e.  Saciability 

SOCIAL  SKILLS 

SO.  Cntarprisins  Intaraata 

R.  enterprising  Intarasts 

34.  Invalvaaant  in  Athlatica  and  Miyaical 
Canditianing 

37.  Inarsy  taval 

T.  Atblatic  Abilitias/Inargy 

VICOR 

41.  Oeainanaa 

42.  SaU'aatcaa 

S.  Paninanca/Salf-aataan 

40.  Traditlanal  valuas 

43.  cansaianticuanass 

44.  ilan*dal{nquaney 

S3.  Canvantianal  Intaraata 

8.  Traditienai  Vaiuss/Canvantian* 
si  I  ty/lian*dal  inquaney 

44.  lacua  af  Central 

47.  Wark  Orientatian 

0.  Wark  Oriantatlen/lacus 
af  Central 

MOTIVATION/ 

STASILITT 

38.  Caoparativanaae 

4S.  tnatianal  Stability 

P.  Caeparsticn/tastlanal  Stability 

Figure.  6.  Hierarchical  map  of  predictor  space. 

355 

Selection  of  Preliminary  Battery  Measures 


The  literature  review  identified  a  large  set  of  predictor  measures,  each 
with  ratings  by  the  researchers  on  several  psychometric  and  substantive 
evaluation  factors.  These  ratings  wee  used  to  select  a  smaller  set  of 
measures  as  serious  candidates  for  Inclusion  in  the  Preliminary  Battery.  Two 
major  practical  constraints  came  into  play:  (1)  no  apparatus  or 
individualized  testing  methods  could  be  used  because  of  the  relatively  short 
time  available  to  prepare  for  battery  administration,  and  the  fact  that  the 
battery  would  be  administered  to  a  large  number  of  soldiers  (several 
thousand)  over  a  nine-month  period  by  relatively  unsophisticated  test 
administrators,  and  (2)  only  four  hours  were  available  for  testing. 

Predictor  team  researchers,  and  several  prominent  scientists  outside  the 
predictor  team,  made  the  selection  of  “off-the-shelf"  measures. 

The  Preliminary  Battery  Included  the  following: 

•  Eight  perceptual-cognitive  measures 

-  Five  from  the  Educational  Testing  Service  (ETS)  French  Kit 
(Ekstrom,  French,  and  Harman,  1976) 

-  Two  from  the  Employee  Aptitude  Survey  (EAS)  (Ruch  and  Ruch,  1980) 

-  One  from  the  Flanagan  Industrial  Tests  (FIT)  (Flanagan,  1965) 

§  Eighteen  scales  from  the  Air  Force  Vocational  Interest  Career 

Examination  (VOICE)  (Alley  and  Matthews,  1982). 


e  Five  temperament  scales  adapted  from  published  scales 

-  Two  from  the  Differential  Personality  Questionnaire  (DPQ) 

-  One  from  the  California  Psychological  Inventory  (CPI)  (Gough, 

1975) 

-  The  Rotter  I/E  scale  (Rotter,  1966) 

-  Validity  scales  from  both  the  DPQ  and  the  Personality  Research 
Form  (PRF)  (Jackson,  1967) 

e  Owen's  Biographical  Questionnaire  (BQ)  (Owens  and  Schoenfeldt, 

1979).  The  BQ  could  be  scored  for  either  11  scales  for  males  or  14 
for  females,  based  on  Owen's  research,  or  for  18  predesignated, 
combined-sex  scales  developed  for  this  research  and  called  Rational 
Scales.  The  rational  scales  had  no  item  on  more  than  one  scale;  some 
of  Owen's  scales  included  items  on  more  than  one  scale.  Items 
tapping  religious  or  socio-economic  status  were  deleted  from  Owens' 
Instrument  for  this  use,  and  items  tapping  physical  fitness  and 
vocational-technical  course  work  were  added. 


356 


In  addition  to  the  Preliminary  Battery,  scores  were  available  for  the 
Armed  Services  Vocational  Aptitude  Battery,  which  all  soldiers  take  prior  to 
entry  Into  service. 

Sample  and  Administration  of  Battery 

The  Preliminary  Battery  was  administered  to  soldiers  entering  Advanced 
Individual  Training  (AIT)  for  four  MOS:  05C,  Radio  Teletype  Operator  (MOS 
code  was  later  changed  to  31C);  19  E/K,  Armor  Crewman;  63B,  Vehicle  and 
Generator  Mechanic;  and  71L,  Administrative  Specialist.  Almost  all  soldiers 
entering  AIT  for  these  MOS  during  the  period  1  October,  1983  to  30  June,  1984 
completed  the  Preliminary  Battery.  We  are  here  concerned  only  with  the 
sample  of  soldiers  who  completed  the  battery  from  1  October,  1983  to  1 
December,  1983,  approximately  2,200  soldiers. 

Analyses 

An  Initial  set  of  analyses  was  performed  on  the  Preliminary  Battery  data 
to  Inform  the  development  of  the  Pilot  Trial  Battery  (PTB).  (The  PTB  was 
Intended  to  Include  newly  developed  tests  and  Inventories  that  would  measure 
the  Important  abilities  and  traits  Identified  via  the  literature  review  and 
expert  judgment  process.  These  PTB  measures  would  be  piloted  and  field 
tested  and  then  revised  to  become  the  Trial  Battery.  See  Figure  5  for  a  flow 
chart  showing  the  sequencing  of  the  various  batteries.)  We  summarize  those 
findings  here.  They  are  more  completely  reported  In  Hough,  Dunnette,  Wing, 
Houston,  and  Peterson  (1984). 

Three  types  of  analyses  were  done.  First,  the  psychometric  characteris¬ 
tics  of  each  scale  were  explored  to  pinpoint  possible  problems  with  the 
measures  of  the  construct  being  measured,  so  those  problems  could  be  avoided 
when  the  Pilot  Trial  Battery  measures  were  developed.  These  analyses 
Included  descriptive  statistics.  Item  analyses  (Including  numbers  of  Items 
attempted  In  the  time  allowed).  Internal  consistency  reliability  estimates, 
and,  for  the  temperament  Inventory,  percentage  of  subjects  falling  the  scales 
Intended  to  detect  random  or  Improbable  response  patterns. 

Second,  the  covariances  of  the  scales  within  and  across  the  various 
conceptual  domains  (I.e.,  cognitive,  temperament,  biographical  data,  and 
vocational  Interest)  were  Investigated  to  detect  excessive  redundancy  among 
the  PB  measures,  especially  across  the  domains.  If  such  redundancies  were 
detected,  then  steps  could  be  taken  to  avoid  such  a  problem  In  the  Pilot 
Trial  Battery.  Third,  the  covariances  of  the  PB  scales  with  ASVAB  measures 
were  studied  to  Identify  any  PB  constructs  that  showed  excessive  redundancy 
with  ASVAB  constructs— again,  so  that  steps  could  be  taken  to  alleviate  such 
problems  for  the  Pilot  Trial  Battery.  Correlation  matrices  and  factor  anal¬ 
yses  were  the  major  methods  of  analysis  for  these  second  and  third  purposes. 

The  psychometric  analyses  showed  some  problems  with  the  cognitive  test. 
The  time  limits  appeared  too  stringent  for  several  tests,  and  one  test 
appeared  to  be  much  too  difficult  for  the  population  being  tested.  Since 
most  of  the  cognitive  tests  used  In  the  Preliminary  Battery  had  been  develop¬ 
ed  on  college  samples  or  other  samples  somewhat  better  educated  than  the 
population  seeking  entry  Into  the  Army,  these  findings  were  not  unexpected. 


357 


The  lesson  learned  was  that  the  Pilot  Trial  Battery  measures  needed  to  be 
accurately  targeted  (In  difficulty  of  lx  mbs  and  time  limits)  toward  the  popu¬ 
lation  of  persons  seeking  entry  Into  the  Arn^.  No  serious  problems  were 
unearthed  for  the  temperament,  bio-data,  and  Interest  scales.  Item-total 
correlations  were  acceptably  high  and  In  accordance  with  prior  findings,  and 
score  distributions  were  not  excessively  skewed  or  different  from 
expectation. 

Covariance  analyses  showed  that  vocational  Interest  scales  were  rela¬ 
tively  distinct  from  the  biographical  and  temperament  scales,  but  the  latter 
two  types  of  scales  showed  considerable  covariance.  Five  factors  were 
Identified  from  the  40  non-cognitive  scales,  two  that  were  primarily  voca¬ 
tional  Interests  and  three  that  were  combinations  of  biographical  data  and 
temperament  scales.  These  findings  led  us  to  consider,  for  the  Pilot  Trial 
Battery,  combining  blogaphical  and  temperament  Item  types  to  measure  the 
constructs  In  these  two  areas.  The  five  non-cognitive  factors  showed 
relative  Independence  from  the  cognitive  PB  tests,  with  the  median  absolute 
correlations  of  the  scales  within  each  of  the  five  factors  with  each  of  the 
eight  PB  cognitive  tests  ranging  from  .01  to  .21.  This  confirmed  our  expec¬ 
tations  of  little  or  no  overlap  between  the  cognitive  and  non-cognitive 
constructs. 

Correlations  and  factor  analysis  of  the  ten  ASVAB  subtests  and  the  eight 
PR  cognitive  tests  confirmed  prior  analyses  of  the  ASVAB  (Kass,  et  al.,  1983) 
and  the  relative  Independence  of  the  PB  tests.  Although  some  of  the  ASVAB-PB 
test  correlations  were  fairly  high  (the  highest  was  .57),  most  were  less  than 
.30  (49  of  the  80  correlations  were  .30  or  less,  65  were  .40  or  less).  The 
factor  analysis  (principal  factors  extraction,  varlmax  rotation)  of  the  18 
tests  showed  all  eight  PB  cognitive  tests  loading  highest  on  a  single  factor, 
with  none  of  the  ASVAB  subtests  loading  highest  on  that  factor.  The  non- 
cognitive  scales  overlapped  very  little  with  the  four  ASVAB  factors  Identi¬ 
fied  In  the  factor  analysis  of  the  ASVAB  subtests  and  PB  cognitive  tests. 
Median  correlations  of  non-cognitive  scales  with  the  ASVAB  factors,  computed 
within  the  five  non-cognitive  factors,  ranged  from  .03  to  .32,  but  14  of  the 
20  median  correlations  were  .10  or  less. 

Computer  Battery  Development 

Compared  to  the  paper-and-pencll  measurement  of  cognitive  abilities  and 
the  major  non-cognitive  variables  (temperament,  biographical  data,  and  voca¬ 
tional  Interests),  the  computerized  measurement  of  psychomotor  and  perceptual 
abilities  was  In  a  relatively  primitive  state  of  knowledge.  Much 
work  had  been  done  In  World  War  II  using  electro-mechanical  apparatus,  but 
relatively  little  work  had  occurred  since  then  .  Microprocessor  technology 
held  out  the  promise  of  revolutionizing  measurement  In  this  area,  but  the 
work  was  (and  still  Is)  In  Its  early  stages.  It  was  clear,  however,  that 
cognitive  ability  testing  was  moving  Into  a  computer-assisted  environment 
through  the  methodology  of  adaptive  testing.  As  Project  A  began,  work  was 
under  way  to  Implement  the  ASVAB  via  computer-assisted  testing  methods  In  the 
Military  Entrance  Processing  Stations.  Therefore,  It  was  also  sensible  from 
a  practical  point  of  view  to  Investigate  these  methods  of  testing. 

Roughly  speaking,  four  phases  of  activities  led  up  to  the  development  of 
computerized  predictor  measures  for  the  Pilot  Trial  Battery:  (1)  Information 


358 


gathering  about  past  and  current  research  In  perceptual /psychomotor  measure* 
ment  and  computerized  methods  of  testing  such  abilities;  (2)  construction  of 
a  demonstration  computer  battery,  and  a  continuation  of  Information  gather* 
Ing;  (3)  selection  of  commercially  available  microprocessors  and  peripheral 
devices,  writing  of  software  for  testing  several  abilities  using  this  hard* 
ware,  and  try  out  of  this  hardware  and  software;  and,  (4)  continued  develop* 
ment  of  software,  and  design  and  construction  of  a  custom-made  peripheral 
device,  which  we  called  a  response  pedestal. 

We  can  only  mention  a  few  of  the  high  points  about  this  part  of  the 
research.  Our  visits  to  military  laboratories  that  were  then  conducting 
computerized  testing  taught  us  that  1arge*sca1e  testing  on  microprocessors 
could  be  accomplished,  that  a  variety  of  computer  languages  was  in  use,  that 
It  would  be  highly  desirable  to  have  the  computerized  test  battery  be  as 
completely  self-administering  as  possible,  and  that  little  Information  was 
then  available  on  the  reliability  or  criterion-related  validity  of  computer¬ 
ized  measures— because  of  the  recency  of  their  development.  By  Itmedlately 
developing  a  demonstration  battery  of  five  tests,  we  convinced  ourselves  that 
some  computer  languages  did  not  allow  enough  power  and  control  of  timing 
events  for  our  purposes.  Ue  ventured  Into  the  area  of  portable  computers, 
then  In  Its  Infancy,  and  found  machines  that  appeared  adequate  for  our  needs; 
namely  powerful  enough,  but  also  rugged  enough  to  withstand  frequent  shipping 
from  one  field  site  to  another. 

Ue  developed  our  software  as  "comnand  processors,"  thus  allowing  project 
scientist  with  no  computer  language  facility  to  construct  entire  tests,  view 
and  try  out  Items,  and  revise  the  tests.  Finally,  we  concluded  that 
responses  made  through  standard  key  boards  and  with  commercially  available 
joysticks  were  Inadequate  for  our  purposes  and  designed  and  had  built  a 
customized  response  pedestal. 


DEVELOPMENT  OF  THE  PILOT  TRIAL  BATTERY 
Identification  of  Measures 


In  March  1984,  a  meeting  of  the  predictor  team  and  the  scientific 
advisors  was  held  to  decide  on  the  measures  to  be  developed  for  the  Pilot 
Trial  Battery.  Information  from  the  literature  review,  expert  judgments. 
Initial  analyses  of  the  preliminary  battery,  and  the  first  three  phases  of 
computer  battery  development  was  presented  and  discussed.  Predictor  team 
staff  made  recommendations  for  Inclusions  of  measures  and  these  were  eval¬ 
uated  and  revised.  Figure  7  shows  the  results  of  that  deliberation  process. 
The  names  of  the  tests  developed  for  the  Pilot  Trial  Battery  are  shown  in  the 
right-hand  column  of  Figure  7.  This  set  of  recommendations  served  as  the 
blueprint  for  the  predictor  team's  test  development  efforts  for  the  next 
several  months. 

Test  Writing  and  Pilot  Tests 

Following  this  meeting,  we  began  writing  Items  for  all  the  instruments. 
When  initial  versions  of  the  Instruments  were  complete  (or,  at  least,  nearly 
so),  we  conducted  the  first  pilot  test. 


359 


Final  Fradlctor 


Frlorlty*  Cataaory  Flint  Trial  Battary  Taat  Waaaa 


Cofaltlvai 

7  Maaory . . . .  (Short)  Haaory  Taat  *  Coaputar 

6  Ruahar  .  Huabar  Haaory  Taat  -  Coaputar 


8  Porcaptual  Spaad  §  Accuracy  ...  Farcaptual  Spaad  A  Accuracy  ' 

Coaputar 

Target  Xdantiflcatlon  Taat  * 
Coaputar 

4  Induction  . .  Eaaaonlng  Taat  1 

Raaaoaing  Teat  2 

5  Enaction  Tlaa  .  Slapla  Enaction  Tlaa  •  Coaputar 

Choice  Enaction  Tlaa  ~  Coaputar 

3  Spatial  Orientation  .  .  Orientation  Tent  1 

Orientation  Teat  2 

Orientation  Teat  3 

2  Spatial  Vlauallxatlon/Flttld 

Indapnndanca  . . .  Shapna  Taat 

1  Spatial  Tlauallaatlon  .  Object  Eotatlona  Taat 

Aaaaabllng  Objects  Taat 
Path  Taat 

Maxa  Taat 

Hon-Cognltlaa,  Blodata/Taaparaaanti 


1  Adjuataant 

2  Oapandablllty 

3  -  Achlayaaant 

4  Phyaical  Condition  ABLE  (Aaaaaaaant  of  Background 

5  Potency  Lifa  Exparlaocna) 

6  Locua  of  Control 

7  Agraaablanaaa/Llkaablllty 

1  Validity  Scalaa 

4 

Ron-Cognitive,  Zntarnatai 

1  Eaalletlc 

2  Inuaatlgatlva 

3  Conventional  i  AVOICE  (Aray  Vocational 

4  Social  Interaxt  Career  Exaainatlon) 

3  Artistic 

6  Enterprising 

Paychoaoton 

1  Multlllab  Coablnstlon  .  Target  Tracking  Text  2  -  Coaputar 

Target  Shoot  -  Coaputar 

2  Preciaion  .  .  .  Target  Tracking  Test  1  -  Coaputar 

3  Manual  Dexterity  .  (Rone) 


*Flnal  priority  arrived  at  via  consensus  ol  March  1984  IPE  attendants. 


Figure  7.  Predictor  categories  discussed  at  IPR  March  1984,  linked  to 
Pilot  Trial  Battery  test  names 


360 


Data  from  the  tryout  were  analyzed,  and  these  results  guided  revision  of 
the  Instruments.  This  process  was  followed  through  three  Iterations,  for 
most  of  the  Instruments. 

Table  1  describes  the  pilot  tests.  Note  that  we  Included  some  marker 
tests  of  the  constructs  for  which  we  were  developing  new  tests. 

Field  Test 


After  the  third  pilot  test,  we  took  a  little  more  time  to  analyze  the 
data,  revise  the  Instruments,  and  prepare  for  a  fairly  comprehensive  field 
test  of  the  battery.  The  objectives  of  this  field  test  were  to  provide  data 
sufficient  to  evaluate  the  psychometric  properties  of  the  battery  (Including 
test-retest  reliability)  and  Its  degree  of  overlap  with  the  ASVAB.  In  addi¬ 
tion,  we  collected  data  to  allow  analysis  of  practice  effects  on  the  com¬ 
puterized  measures  and  faking/fakablllty  on  the  temperament/biodata  and 
Interest  Inventories. 

A  sample  size  of  about  250  was  available  for  the  psychometric  analyses 
of  the  Pilot  Trial  Battery,  about  170  for  the  analyses  of  overlap  with  the 
ASVAB,  about  115  for  test-retest  analyses,  about  75  for  practice  effects  on 
computers,  and  about  65-115  for  the  faking/fakablllty  study  (In  each  experi¬ 
mental  cell,  total  N  of  about  650).  Data  were  collected  primarily  at  Fort 
Knox,  Kentucky,  but  data  for  the  faking/fakablllty  study  were  also  collected 
at  the  Minneapolis  Military  Entrance  Processing  Station  and  Fort  Bragg,  N.C. 

With  a  few  exceptions,  the  Pilot  Trial  Battery  was  psychometrically 
sound,  and  appeared  to  be  measuring  abilities  that  overlapped  little  with  the 
ASVAB,  especially  In  the  temperament/biodata  and  Interest  domains.  The 
evaluation  of  practice  effects  on  computerized  test  scores  showed  this  to  be 
of  little  concern.  Gain  scores  after  practice  were  no  higher  for  these  tests 
than  for  those  observed  on  paper-and-pencll  tests  of  cognitive  ability  given 
twice  over  a  short  period  of  time  (two  weeks  or  so).  The  gain  scores  for 
computerized  test  scores  ranged  from  nearly  zero  to  about  .4  of  a  standard 
deviation,  averaging  about  a  quarter  of  a  standard  deviation. 

We  reached  the  following  conclusions  from  the  faking/fakablllty 
research. 

e  Soldiers  can  distort  their  responses  on  the  temperament/biodata  and 
interest  Inventories  when  Instructed  to  do  so. 

•  Special  response  validity  scales  on  the  temperament/ biodata 
Inventory  do  detect  such  Intentional  faking  on  that  Inventory  and 
can  be  used  to  adjust  scores  on  the  substantive  scales  so  as  to 
remove  most  of  the  effects  of  Intentional  faking. 

•  Those  special  response  validity  scales  on  the  temperament/biodata 
Inventory  are  not  sufficiently  effective  for  detecting  and 
adjusting  faked  scores  on  the  Interest  inventory. 

•  Applicants  for  the  U.S.  Army  did  not  appear  to  be  distorting  their 
responses  (to  appear  more  favorably  qualified).  Thus,  It  appears 
that  Intentional  distortion  may  not  be  a  significant  problem  In 


361 


Table,  l. 

Summary  of  Pilot  Testing  Sessions  for  Pilot  Trial  Battery 


Pilot 
T«t  » 

1 


2 


3 


Location 

Date 

Total 
Sampl e 
Size 

Fort  Carson 

17  April 
1984 

43 

Fort  Campbell 

16  May 

1984 

57 

Fort  Lewis 

11-15  June 
1984 

118 

No. /Type  of  Tests  Administered 

10  New  Cognitiye 
9  Marker  Cognitiye 
0  New  Non-Cognitiye 
0  Marker  Non-Cognitiye 

7  Computerized  Measures 

10  New  Cognitiye 
5  Marker  Cognitiye 
2  New  Non-Cognitiye 

1  Marker  Non-Cognitiye 
0  Computerized  Measures 

10  New  Cognitiye 
4  Marker  Cognitiye 

2  New  Non-Cognitiye 

0  Marker  Non-Cognitiye 

8  Computerized  Measures 


362 


Army  applicants  in  the  present  volunteer  Army.  We  could  not,  of 
course,  collect  any  data  that  would  shed  light  on  this  problem  in  a 
draft  situation. 


THE  TRIAL  BATTERY 

Devel opment 

At  the  completion  of  the  field  tests,  we  felt  we  had  shown  the  Pilot 
Trial  Battery  to  be  ready  for  use  in  the  concurrent  validation  research.  We 
used  the  data  from  the  field  test  to  improve  the  Pilot  Trial  Battery 
measures,  but  we  also  had  to  shorten  the  length  of  the  battery.  It  required 
6.5  hours  to  administer  the  entire  battery,  and  only  4  hours  of  testing  time 
were  available. 

Three  general  principles,  consonant  with  the  theoretical  and  practical 
orientation  that  had  been  used  since  the  Inception  of  the  project,  guided  the 
revision  and  reduction  decisions: 

1.  Maximize  the  heterogeneity  of  the  battery  by  retaining  measures  of 
as  many  different  constructs  as  possible. 

2.  Maximize  the  chances  of  incremental  validity  and  classification 
efficiency. 

3.  Retain  measures  with  adequate  reliability. 

Using  all  accumulated  information,  the  final  decisions  were  made  in  a 
series  of  meetings  attended  by  the  project  staff  and  by  the  Scientific 
Advisory  Group.  Considerable  discussion  was  generated  at  these  meetings,  but 
the  group  was  able  to  reach  a  consensus  on  the  reductions  and  revisions  to  be 
made  to  the  Pilot  Trial  Battery. 

Some  tests  and  scales  were  dropped,  some  were  shortened,  and  some 
redundant  items  asking  about  soldier  demographics  were  removed.  Table  2 
shows  the  array  of  measures  that  made  up  the  Trial  Battery.  (See  Peterson 
(in  press)  for  a  complete  description  of  all  research  activities  leading  up 
through  the  development  of  the  Trial  Battery.) 

Trial  Battery  Scores 

As  earlier  described  the  Trial  Battery  was  administered  to  the  large, 
concurrent  validity  sample.  We  also  collected  test-retest  data  (two  week 
interval)  on  a  subset  of  about  500  soldiers. 

A  total  of  seventy  scores  was  generated  from  the  Trial  Battery.  Forty- 
three  of  these  came  from  the  non-cognitive  inventories  (Assessment  of  Back¬ 
ground  and  Life  Experiences  (ABLE),  the  Army  Vocational  Interest  Career 
Examination  (AVOICE),  and  the  Job  Orientation  Blank  (JOB)  -  which  had  been 
included  in  the  AVOICE  for  the  Pilot  Trial  Battery  but  was  separately  admin¬ 
istered  for  the  Trial  Battery).  Six  scores  came  from  the  six  paper-and- 
pencil,  cognitive  tests.  Twenty-one  scores  were  generated  from  the  ten 


363 


Table  2. 


Description  of  Measures  In  the  Trial  Battery 


COGNITIVE  PAPER-AND-PENCIL  TESTS 

Number  of  Items 

Time  Limit 
(minutes) 

Reasoning  Test 

30 

12 

Object  Rotation  Test 

90 

7.5 

Orientation  Test 

24 

10 

Maze  Test 

24 

5.5 

Map  Test 

20 

12 

Assembling  Objects  Test 

32 

16 

COMPUTER-ADMINISTERED  TESTS 

Number  of  Items 

Approximate  Time 

Demographics 

2 

4 

Reaction  Time  1 

15 

2 

Reaction  Time  2 

30 

3 

Memory  Test 

36 

7 

Target  Tracking  Test  1 

18 

8 

Perceptual  Speed  and  Accuracy  Test 

36 

6 

Target  Tracking  Test  2 

18 

7 

Number  Memory  Test 

28 

10 

Cannon  Shoot  Test 

36 

7 

Target  Identification  Test 

36 

4 

Target  Shoot  Test 

30 

5 

NON-COGNITIVt  PAPER-AND-PENCIL 

INVENTORIES 

Number  of  Items 

Approximate  Time 

Assessment  of  Background  and  Life 

209 

35 

Experiences  (ABLE) 

Army  Vocational  Interest  Career 

176 

20 

Examination  (AVOICE) 


364 


computer-administered  tests.  With  regard  to  the  computer-administered  tests, 
we  did  evaluate  a  number  of  alternative  methods  of  scoring  these  tests  -  such 
as  the  use  of  slopes.  Intercepts,  and  slightly  different  methods  of  computing 
means  (priority,  different  methods  of  trimming  Items  prior  to  computation  of 
means.)  We  selected,  generally  speaking,  the  most  reliable  and  straight¬ 
forwardly  Interpreted  scores. 

Table  3  shows  N's,  Means,  SD's,  reliabilities,  and  uniqueness  (from 
ASVAB)  coefficients  for  scores  on  the  cognitive,  paper- and- pencil  tests. 
Tables  4  and  5  show  similar  data  for  the  computer-administered  tests.  Tables 
6,  7,  and  8  show  similar  data  for  the  ABLE,  AVOICE,  and  JOB  scale  scores. 
(Uniqueness  coefficients  are  not  shown  for  these  instruments,  but  range  from 
.40  to  .88,  with  median  U2's  of  .79  for  ABLE,  .80  for  AVOICE,  and  .57  for 
JOB). 


As  these  tables  show,  the  battery  possesses  adequate  to  excellent 
psychometric  properties,  with  the  exception  of  some  low  reliabilities  on  a 
few  computer-administered  test  scores.  These  low  reliabilities  primarily 
occur  on  the  proportion  correct  scores,  and  this  was  anticipated.  The  Items 
on  these  tests  can  almost  always  be  answered  correctly  If  the  examinee  takes 
enough  time.  This  operates  to  severely  restrict  the  range  on  the  proportion 
correct  scores,  but  Increases  the  variance  (and  reliability)  on  the  decision 
time  scores,  as  was  our  Intention. 

These  Trial  Battery  scores  were  the  raw  material  for  the  validation 
analyses  of  the  concurrent  validity  sample,  on  the  “new  predictor"  side  of 
the  equation. 

To  conclude,  we  return  to  the  research  objectives  stated  at  the  begin¬ 
ning  of  the  paper. 

1.  Identify  "best  bet"  measures  —  This  objective  has  been  met.  As 
noted,  we  sifted  through  a  mountain  of  literature,  translating  the 
Information  onto  a  common  form  that  enabled  us  to  evaluate  constructs 
and  measures  In  terms  of  several  psychometric  and  pragmatic 
criteria.  The  results  of  that  effort  fed  into  the  expert  judgment 
process  wherein  35  personnel  psychologists  provided  the  data  neces¬ 
sary  to  develop  our  first  model  of  the  predictor  space.  After 
further  review  by  experienced  researchers  In  the  Army  and  an  advisory 
group,  a  set  of  "best  bet"  constructs  was  settled  on.  We  also  made 
some  field  visits  to  observe  combat  arms  jobs  first-hand.  In  addition 
to  receiving  criterion-side  information  from  other  Project  A 
researchers;  all  of  this  information  was  very  useful  In  developing 
new  measures. 

2.  Develop  measures  of  "best  bet"  predictors  —  This  objective  was 
accomplished  by  following  the  blueprint  provided  from  the  first 
objective.  We  carried  out  many  small  and  not-so-small  sample  tryouts 
of  these  measures  as  they  were  developed.  The  Trial  Battery  Is  the 
tangible  product  of  meeting  this  objective. 

3.  Develop  procedures  for  efficiently  administering  predictor  measures 
—  As  anyone  who  has  done  research  In  military  settings  is  aware, 
soldiers'  time  Is  precious  and  awarded  research  time  is  not  to  be 


365 


Concurrent  Validity  Data  Analysis:  Means.  Standard  Deviations,  and 
Reliability  and  Uniqueness  Eatlaetea  for  Paoer-and-Pencll  Cognitive 


a 

** 

a 

« 


«  • 


e  e  IS  s 
•  •  •  • 


»  S  S  2  S 


81 


S  i:: 

•  • 


«  Si  s  g 
•  •*••• 

a  Qi  s  =  s 


9ft  ^  ^  9^ 

s  s  §  $  $ 


9 


fi 


s 


*# 

I 


i 


8 

*o 

I 

• 

8 

8 

•• 

S 

e# 

I 

«« 

s 

I 


a# 

s 

8 


366 


Concurrent:  Validity  Data  Analvsla;  Means.  Standard  Deviations*  and 
Reliability  and  Unloueneae  Estteatea  for  Computerlgeq  pgYghPlPtQg 

Taats 


367 


Ttst*ttttst  rtllability  MtlaatM  ara  baaed  on  aaapto  alioa  of  468  to  487. 


Table 


368 


460  -  479  for  tett-retekt  correlations. 


Table 


^  S  A 

ili^ 


e  iS  iS  8  iS  e  $  IS  IK  96 

8  888 
a  a  a  a 

S;!Si:!S8  8IS888 

S8  S 

a  a  a 

88  8 
a  a  a 

9  8  8  »  K  8 ' 6:  8  8 

^  fO  Ok  ^ 
e  lA  M 

a  a  a  a 

lA  M  O  •» 

aaaaaaaaaaa 

8  8  7  »  8  8  8  8  8 

lA  «#  K  lA 

a  a  a  a 

8  R!  8  8  8  S  S  S  8  1^  8 

968  8  88888888 

::888! 
8  8  8  8 

11 

11 

8 

23 

>»  •  2: 


2""  S  —  — 

-“  SiSS*  2  6l Ji 


•  l|h 

I  ^IJs 
i  ill  J 

-  -  - 1  fc 

^  £  •  S  8 

«l  9  W  B  0> 


369 


Total  graif>  altar  acraonfng  for  aUalna  data  and  randoa  raaponding. 

»  ■  *08  -  412  for  teat-retest  correlation  (8  ■  414  for  Non-Randoa  Reaponea  taat-retaat  correlation). 


Table 


estsssR  esesic  8;:«cr  R8C9;i( 


- 1^-. 

|s||  SS888  8S88R  8C88S  R S 

sn- 


|H^  88888  88888  88888  88888  8; 

. 


S888ft  8888S  8888S  88888  88 


88888  SRSaS  88882  2  ♦  8  •"  • 


»  88888  882R8  ^  8  R  R  R  22;;88  8S 

88888  88888  88888  88888  88 

«=-  . . 


8 


s  I  E  ^  2 

8.  I.l*  IspI 

t  S  rtjSS 

I  i  JIIi^  II  li  si 

Isi-E  ^l-S‘5  “*  1-5  -li"!  n 

Se”§«  •9f'E8  iu-*!  I- 

slltl  8SC>s  I-bIw-S  *2 

|fsil  27:|s  IlS-a  2<5 

ussitfu  uiS»SMi3  ^  > 


8  •  2  « 

r  S  S  8 

jilr  II 


I  2  1^31.  I 


t  . 
^  I 


370 


409  for  tnt-r«tMt  correlation. 


or  Total  6 


371 


squandered.  We  think  we  have  developed  and  Implemented  effective 
methods  for  getting  maximum  quality  and  quantity  of  data  out  of  our 
data  collection  efforts.  The  favorable  results  we  have  so  far 
achieved  In  completeness  and  usefulness  of  data  are  due  In  large 
part,  we  think,  to  the  attention  paid  to  this  objective. 

4.  Estimate  reliability  and  vulnerability  of  measures  —  This  objective 
has  also  been  largely  accomplished.  Analyses  to  date  Indicate  that 
the  new  measures  are  psychometrlcally  sound  and  acceptably 
invulnerable  to  the  various  sources  of  measurement  problems  —  or  we 
have  devised  some  ways  to  adjust  for  such  effects.  However,  more 
specifically  targeted  research  would  be  useful  In  this  area. 

5.  Determine  the  Interrelationships  between  the  new  measures  and 
current  preinduction  measures  —  Work  still  remains  on  this 
objective,  but  the  data  collected  to  date  show  that  the  new  measures 
have  much  variance  that  Is  not  shared  with  the  ASVAB,  and  that  the 
across-domain  shared  variance  Is  low  (e.g.,  the  new  cognitive 
measures  have  low  correlations  with  the  non-cognitive  measures). 

6.  Determine  the  level  of  prediction  of  sollder  performance, 
classification  efficiency,  and  Incremental  validity  of  the  new  and 
measures  alas,  other  presenters  at  this  symposium  are  providing 
this  Information,  so  I  will  now  shut  up  and  sit  down. 


372 


RcftrtncM 


Alley,  W.  E.,  &  Matthews,  M.  D.  (1982).  The  vocational  interest  career  examination. 
Journal  of  Psychology,  112, 169-193. 

Ekstrom,  R.  B.,  French,  J.  W.,  &  Harman,  H.  H.  (1976).  Manual  for  kit  of  factor- 
referenced  cognitive  tests.  Princeton,  NJ:  Educational  Testing  Service. 

Flanagan,  J.  C.  (1965).  Flanagan  industrial  test  manual.  Chicago:  Sdenoe  Research 
Associates. 

Hough,  L  M.,  Dunnette,  M.  D.,  Wing,  H.,  Houston,  J.  S.,  &  Peterson,  N.  G.  (1984). 
Covariance  analyses  of  cognitive  and  rxxvoognitive  measures  of  Army  recruits:  An 
initial  sample  of  Preliminary  Battery  Data  Presented  at  the  92nd  Anrujal  Convention 
of  the  American  Psychological  Association,  Toronto.  In  Eaton  et  al.  (eds.)  (1964). 
Improving  the  selection,  classification,  and  utilization  of  Army  erriisted  personnel: 
Annual  Report,  1984  fiscal  year  (ARi  Technical  Report  660).  Alexandria,  VA:  Army 
Research  Institute. 

Hough,  L  M.,  Kamp,  J.  0.,  &  Barge,  B.  N.  (1988).  Utility  of  temperament,  biodata,  and 
interest  assessment  for  predicting  Job  perforrnance:  a  review  and  integration  of  the 
literature.  ARi  Research  Note,  ADA  178944 

Jackson,  D.  N.  (1967).  Personality  Research  Form  Manual.  Goshen,  NY:  Research 
Psychologists  Press. 

Kass.  R.  A.,  Mitchell,  K.  J.,  Grafton,  F.  C.,  &  Wing,  H.  (1983).  Factor  structure  of  the 
Armed  Services  Vocational  Aptitude  Battery  (ASVAB)  Forms  8, 9,  and  10:  1981  Army 
applicant  sample.  Educational  and  Psychological  Measurem^  43, 1077-1068. 

McHenry  J.  J.,  &  Rose,  S.  R.  (1988).  The  validity  and  potential  usefulness  of 

psychomotor  ability  tests  for  personnel  seiection  and  classification.  ARI  Research 
Note,  88-13  ADA  193558. 

Owens,  W.  A.,  &  Schoenfeldt,  L  F.  (1979).  Toward  a  classification  of  persons.  Journal 
of  ^plied  Psychology  Monographs,  64,  569-607. 

Peterson,  N.  G.  (1987).  Development  and  field  test  of  the  Trial  Battery  for  Project  A.  ARI 
Technical  Report. 

Rotter,  J.  B.  (1966).  Generalized  expectancies  for  internal  versus  external  control  of 
reinforcement.  Psychological  Monographs,  80,  (1,  Whole  No.  609). 

Ruch,  F.  L,  &  Ruch,  W.  W.  (1980).  Employee  Aptitude  Survey:  Technical  Report.  Los 
Angeles,  CA:  Psychological  Services,  Inc. 


373 


Schmidt.  F.  L,  Hunter.  J.  E..  Croll.  P.  R..  &  KcKenzie.  R.  C.  (1983).  Estimation  of 
employment  test  validities  by  expert  judgment.  Journal  of  App^  Psychology.  68, 
590^1. 

Toquam,  J.  L.  Corpe.  V.  A..  &  Dunnette.  M.O.  (In  press).  Cognitive  abilities:  A  review 
of  theory,  history,  and  v^idity.  ARI  Researdi  Note. 

Wing.  H..  Peterson.  N.G..  &  Hoffman.  R.  E.  (1984).  Expert  judgments  of  predictor- 
criterion  validity  relationships.  Presented  at  the  92nd  Annual  Convention  of  the 
American  Psychological  Association.  Toronto.  Ontario.  Canada,  in  Eaton  etai.  (eds.) 
(1984).  Improving  the  selection,  classification,  and  utilization  of  Army  enlisted 
personnel:  Annual  report.  1984  fiscal  year  (ARI  Technical  Report  660).  Alexandria. 
VA:  Army  Research  Institute. 


374 


AN  EXAMINATION  OF  RACE  AND  SEX  EFFECTS  ON 
PERFORMANCE  RATINGS 


Elaine  0.  Pulakos 
Aaerican  Institutes  for  Research 


Leonard  A.  White 
U.S.  Army  Research  Institute 

Walter  C.  Borman 

Personnel  Decisions  Research  Institute 


Presented  at  the  Annual  Conference  of  the 
~  Society  for  Industrial  and  Organizational  Psychology 

Atlanta,  Georgia 

April  1987 


The  vie«»s  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


375 


p 


Abstract 

This  research  Investigated  the  effects  of  rater  source  (peer, 
supervisor),  rater  and  ratee  race  (black,  white,  Hispanic),  rater  and 
ratee  sex,  and  Job  type  on  ratings  collected  for  39,537  Army  enlisted 
personnel.  The  results  showed  that  race  and  sex  did  not  Interact  In 
their  effects  on  the  ratings.  Although  significant  effects  were 
observed  for  sex,  race,  rater  source,  and  Job  type,  the  proportions  of 
variance  accounted  for  by  these  effects  were  minimal.  Of  particular 
Interest  was  that  the  race  effects  found  here  were  considerably  less 
than  those  reported  In  a  recent  meta-analysis  (Kralger  &  Ford,  1985). 
Results  and  Implications  are  discussed. 


376 


An  Examination  of  Race  and  Sex  Effects  on 
Performance  Ratings 

Considerable  research  has  Investigated  rater  and  ratee  gender 
and/or  race  effects  on  ratings.  Unfortunately,  inconsistent  findings 
have  resulted  from  this  body  of  literature.  Regarding  gender  effects, 
the  only  consistent  research  finding  is  that  rater  and  ratee  sex  do  not 
appear  to  interact  in  their  effects  on  evaluative  judgments  (e.g., 

Bartol  &  Butterfield,  1976;  Mobley,  1982;  Pulakos  &  Wexley,  1983). 
Although  significant  main  effects  for  sex  are  evident,  many  of  the 
studies  reporting  such  effects  have  been  conducted  in  a  1aborator>. 
Because  relatively  little  field  research  has  investigated  gender  effects 
on  ratings,  it  is  difficult  to  draw  definitive  conclusions  about  the 
existence  or  lack  thereof  of  sex  effects  in  ongoing  performance 
appraisal  situations. 

Inconsistent  findings  have  also  resulted  regarding  rater  and  ratee 
race  effects  on  performance  ratings.  However,  a  recent  meta-analysis  of 
ratee  race  effects  revealed  corrected  mean  correlations  between  ratee 
race  and  ratings  given  by  black  and  white  raters  of  .183  and  -.220, 
respectively,  indicating  that  both  black  and  white  raters  assigned 
significantly  higher  ratings  to  ratees  of  their  own  race  than  to  ratees 
of  the  other  race  (Kraiger  &  Ford,  1986).  The  meta-analysis  also  showed 
that  ratee  race  effects  were  more  likely  to  be  found  in  field  studies  in 
which  blacks  constituted  a  small  percentage  of  the  workforce. 

The  present  research  investigated  the  effects  of  rater  and  ratee 
gender/race  on  ratings.  With  few  exceptions,  previous  field  research 
has  lacked  adequate  sample  sizes  to  support  an  investigation  of  race  x 
sex  interactions.  In  addition,  this  research  extends  previous 
investigations  of  race  and  sex  effects  on  ratings  in  two  important  ways. 


377 


First,  using  a  common  set  of  rating  scales,  both  peers  and  supervisors 
rated  thousands  of  ratees  occupying  19  different  Jobs.  It  was  thus 
possible  to  examine  potential  differences  In  race  and  sex  effects  as  a 
function  of  the  rating  source  (peer  or  supervisor)  and  the  type  of  job 
held  by  the  ratee.  Second,  three  levels  of  rater  and  ratee  race  (I.e., 
whites,  blacks,  and  Hispanics)  were  Included  rather  than  Including  only 
whites  and  blacks,  as  has  been  characteristic  of  the  vast  majority  of 
research  Investigating  race  effects. 

Method 

Sample 

The  data  reported  here  were  collected  as  part  of  Project  A,  the 
Army’s  multi-year  research  program  to  develop  an  Improved  selection  and 
classification  system  for  enlisted  personnel.  A  total  of  6377 
supervisors  and  8174  peers  evaluated  first-term  soldiers  representing  19 
jobs  selected  to  be  representative  of  the  entire  population  of  Army 
jobs.  The  supervisors  rated  an  average  of  2.29  subordinates,  and  the 
peers  rated  an  average  of  3.05  co-workers,  yielding  a  total  sample  of 
39,537  rater-ratee  pairs.  Table  1  shows  a  breakdown  of  the  number  of 

pairs  representing  each  rater/ratee  race  and  rater/ratee  sex 
combination. 

Procedure 

Peer  and  supervisor  ratings  were  collected  on  10  7-po1nt  behavioral 
rating  dimensions  that  were  developed  to  assess  first-term  soldier 
effectiveness  In  any  Army  job.  Raters  were  trained  on  how  to  use  the 
rating  scales  properly  and  on  how  to  avoid  several  common  rating  errors. 


378 


A  principal  components  analysis  with  a  varlmax  rotation  was  used  to 
Identify  constructs  underlying  the  performance  ratings.  A  three-factor 
solution  was  chosen  as  the  most  psychologically  meaningful,  with  the 
factors  named  and  defined  as  shown  In  Table  2.  For  each  ratee,  three 
unweighted  composites  were  calculated  using  the  dimension  ratings  that 
had  the  highest  loadings  on  each  factor.  Alpha  coefficients  for  the 
composite  scores  were:  Technical  Skill  and  Job  Effort  (.88),  Personal 
Discipline  (.80),  and  Military  Bearing  (.62).  Intercorrelations  among 
the  composites  ranged  from  .51  to  .74. 

Results 

A  preliminary  multivariate  analysis  of  variance  (MANOVA)  revealed 
that  race  and  sex  did  not  interact  in  their  effects  on  the  three 
compos ite_s.  Hence,  race  and  sex  effects  were  examined  separately  in 
subsequent  analyses.  Preliminary  NANOVAs  were  also  performed  to  examine 
job-type  x  race  and  job-type  x  sex  Interactions.  Although  significant 
interactions  resulted,  the  rating  variance  accounted  for  by  these  was 
minimal  (i.e.,  less  than  one-half  of  one  percent).  Thus,  job-type  was 
excluded  from  further  analyses. 

Race  Effects  on  the  Ratings 

A  2  (Rating  Source)  x  3  (Rater  Race)  x  3  (Ratee  Race)  MANOVA  was 
conducted  to  examine  race  effects  on  the  three  rating  measures.  The 
levels  of  rating  source  were  peer  and  supervisor,  and  the  levels  of  race 
were  black,  white,  and  Hispanic.  Upon  obtaining  a  significant  MANOVA, 
univariate  analyses  were  examined  for  each  dependent  measure.  These 
results  are  shown  in  Table  3.  The  means  for  all  rater  group  by  race 
combinations  are  shown  in  Table  4. 


379 


With  the  exception  of  the  ratee  race  main  effect  for  the  Bearing 
factor,  individual  effects  accounted  for  substantially  less  than  one 
percent  of  the  rating  variance.  In  fact,  the  total  proportions  of 
variance  accounted  for  by  iU  rater  source  and  race  effects  were  quite 
small  (i.e..  Technical  Skill  and  Job  Effort,  -  .016;  Personal 
Discipline,  r^  -  .003;  and  Military  Bearing,  r^  ■  .028).  Because  of  the 
minimal  variance  accounted  for,  interpretation  of  the  interactions  will 
not  be  undertaken.  It  is  interesting  to  note,  however,  that  the  nature 
of  the  effects  was  not  consistent  across  the  three  rating  factors.  For 
instance,  blacks  were  rated  higher  than  whites  on  Military  Bearing  but 
lower  than  whites  on  the  other  two  dimensions. 

Sex  Effects  on  the  Ratings 

A  2  (Rating  Source)  x  2  (Rater  Sex)  x  2  (Ratee  Sex)  MANOVA  was 
conducted~to  examine  sex  effects  on  the  ratings.  Again,  upon  obtaining 
a  significant  MANOVA,  univariate  analyses  were  examined.  These  results 
are  presented  in  Table  5.  The  means  for  each  rating  source  by  sex 
combination  are  shown  in  Table  6. 

Similar  to  the  race  analyses,  the  proportions  of  rating  variance 
accounted  for  by  the  significant  effects  were  minimal.  The  total 
proportions  of  variance  accounted  for  by  iU  rater  source  and  sex 
effects  were  as  follows:  Technical  Skill  and  Job  Effort  (r^  -  .012), 
Personal  Discipline  (r^  -  .001),  and  Military  Bearing  (r^  -  .004).  In 
addition,  the  directions  of  the  significant  main  effects  and 

interactions  were  not  consistent  across  the  three  rating  factors. 


380 


Repeated  Measures  Analyses 


To  determine  how  the  results  reported  above  may  have  been  affected 
by  the  fact  that  the  rating  observations  were  not  independent  (i.e., 
raters  rated  multiple  ratees),  a  2  x  2  x  2  repeated  measures  MANOVA  was 
conducted  with  rater  source  (peer  or  supervisor)  and  rater  race  (black 
or  white)  constituting  the  between  subjects  factors,  ratee  race  (black 
or  white)  as  the  single  within  subjects  factor,  and  measures  of  the 
three  rating  factors  as  the  multiple  dependent  measures.  Unfortunately, 
sufficient  data  were  not  available  to  include  Hispanics  in  this 
analysis.  A  repeated  measures  MANOVA  like  that  described  above  was  also 
conducted  to  investigate  the  sex  effects.  These  analyses  yielded 
results  virtually  identical  to  those  reported  above.  The  only  exception 
was  that  in  the  repeated  measures  analysis,  the  two-  and  three-way 
interactions  involving  ratee  sex  were  nonsignificant  for  all  three 
dependent  measures. 

Discussion 

The  present  field  research  investigated  race  and  sex  effects  on 
peer  and  supervisor  ratings  of  ratees  occupying  a  variety  of  jobs.  The 
overwhelming  finding  was  that  the  proportions  of  variance  accounted  for 
by  gender  and,  especially,  race  were  less  than  have  been  found  in 
previous  research.  For  example,  Kraiger  and  Ford  (1985)  reported 
correlations  between  ratee  race  and  ratings  for  black  and  white  raters 
of  .183  and  -.220,  with  the  variance  accounted  for  by  these  correlations 
equal  to  three  and  five  percent,  respectively.  The  present  variance 
accounted  for  by,  especially,  the  race  by  ratee  race  interactions  was 
substantially  less  than  one  percent. 


381 


One  difference  between  the  Kralger  and  Ford  (1985)  research  and 
this  research  Is  that  no  corrections  (e.g.,  for  unreliability)  were  made 
here.  In  order  to  enable  a  more  direct  comparison  between  Kralger  and 
Ford’s  results  and  those  reported  here,  a  meta-analysis  similar  to 
Kralger  and  Ford’s  was  conducted.  The  proportion  of  rating  variance 
accounted  for  by  ratee  race  was  still  much  less  than  reported  In  Kralger 
and  Ford’s  research.  Thus,  while  we  believe  that  future  research  should 
focus  on  possible  explanations  for  observed  effects  rather  than  on 
effect  sizes  alone,  It  may  be  premature  to  accept  Kralger  and  Ford’s 
analysis  results  as  the  best  estimate  of  the  ratee  race  effect  size  In 
the  population. 

One  explanation  for  the  present  race  results  Is  that  raters  were 
trained  to  focus  specifically  on  ratee  job  performance  and  to  avoid 
using  nonperformance  factors  (e.g.,  sex,  race)  as  a  basis  for  their 
evaluations.  Another  possible  explanation  Is  that  racial  bias  may  be 
less  prevalent  In  military  versus  civilian  work  settings  due  to 
reasonably  large  percentages  of  minority  service  members.  This 
explanation  Is  consistent  with  Kralger  and  Ford’s  (1985)  finding  that 
race  effects  were  less  likely  to  be  found  when  blacks  constituted  a 

larger  percentage  of  the  workforce. 

Because  no  meta-analysis  has  been  conducted  to  estimate  population 
sex  effect  sizes  on  ratings,  It  Is  more  difficult  to  compare  the 
magnitudes  of  the  present  sex  effects  to  previous  research.  Further, 
relatively  few  field  studies  have  reported  the  rating  variance  accounted 
for  by  gender.  Nevertheless,  In  some  cases,  reported  effect  sizes  have 


382 


been  larger  than  those  found  here  (e.g.,  Mobley,  1982),  whereas  In  other 
cases,  sex  has  been  shown  to  have  no  effect  on  ratings  (e.g.,  Thompson  & 
Thompson,  1985)  or  to  account  for  only  minimal  proportions  of  the  rating 
variance  (Pulakos  &  Uexley,  1983). 

Two  additional  points  are  worth  mentioning  regarding  the  results  of 
this  research.  First,  Landy  and  Farr  (1980)  concluded  that  sex 
stereotype  of  the  occupation  appears  to  Interact  with  ratee  sex  such 
that  males  receive  more  favorable  evaluations  than  females  In 
traditionally  masculine  occupations  but  that  no  differences  or  smaller 
differences  In  favor  of  females  occur  In  traditionally  feminine 
occupations.  Although  significant  job  type  x  ratee  sex  Interactions 
were  observed  In  this  study,  the  proportions  of  the  variance  accounted 
for  by  these  effects  were  trivial.  Beyond  this,  however,  even  the 
nature  of  the  significant  effects  did  not  provide  support  for  Landy  and 
Farr’s  sex -role  stereotype  hypothesis. 

The  second  noteworthy  point  concerns  the  lack  of  sex  x  race 
Interactions  found  In  this  study.  Because  of  Inadequate  sample  sizes, 
most  performance  appraisal  field  research  has  been  unable  to  Investigate 
whether  or  not  race  and  sex  Interact  In  their  effects  on  ratings  (see 
Thompson  &  Thompson,  1985  for  an  exception).  There  has,  however,  been 
some  assessment  center  research  (e.g.,  Huck  &  Bray,  1976;  Schmitt  & 

Hill,  1977)  In  which  significant  race  x  sex  Interactions  have  been 
observed.  As  an  example.  In  the  Schmitt  and  Hill  study,  black  females 
tended  to  be  rated  lower  when  they  were  In  assessment  groups  with  larger 
proportions  of  white  males.  The  results  of  the  present  study  along  with 


383 


nonsignificant  race  x  sex  interactions  reported  by  Thompson  and  Thompson 
(1985)  seem  to  suggest  that  the  interactive  effects  of  race  and  sex 
found  in  assessment  center  ratings  do  not  generalize  to  performance 
appraisal  situations.  It  may  be  that  because  assessment  centers  are 
characterized  by  relatively  short  durations  of  interpersonal  contact 
between  assessors  and  assessees  as  we11  as  a  more  limited  amount  of 
ratee  performance  information,  cues  of  race  and  sex  may  be  more  salient 
to  assessors,  increasing  the  probability  that  these  factors  will  have 
greater  influence  on  the  ratings  (Wendelken  &  Inn,  1981). 

Although  the  present  research  clearly  shows  that  systematic  bias  as 
a  function  of  rater  or  ratee  sex  and  race  was  not  an  important  factor 
influencing  the  ratings,  future  research  could  examine  the  evaluation 
processes  involved  when  the  same  versus  different  race  or  sex  raters 
evaluate  ratees.  For  example,  irrespective  of  whether  or  not  there  are 
mean  subgroup  differences  in  ratings,  raters  may  use  different  cues  when 
evaluating  someone  of  a  different  race  or  sex  versus  a  person  of  the 
same  race  or  sex.  Policy  capturing  (e.g.,  Zedeck  &  Kafry,  1977)  or  a 
lens  model  approach  (Schmitt,  Noe,  &  Gottschalk,  1986)  are  possible 
strategies  for  investigating  such  similarities  and  differences. 

References 

Bartol,  K.  M.,  &  Butterfield,  D.  A.  (1976).  Sex  effects  in 

evaluating  leaders.  Journal  of  Applied  Psychology,  fel,  446- 
454. 

Huck,  J.  R.,  &  Bray,  D.  W.  (1976).  Management  assessment  center 
evaluations  and  subsequent  Job  performance  of  white  and  black 
females.  Personnel  Psychology.  22,  13-30. 


384 


Kraiger,  K.,  &  Ford,  J.  K.  (1985).  A  meta-analysis  of  ratee  race 
effects  In  performance  ratings.  Journal  of  Applied 
Psychology.  Zfl.  56-65. 

Landy,  F.  J.,  &  Farr,  J.  L.  (1980).  Performance  rating. 

Psychology  Bulletin.  fiZ.  72-107. 

Mobley,  W.  H.  (1982).  Superyisor  and  employee  race  and  sex 
effects  on  performance  appraisals:  A  field  test  of  adyerse 
impact  and  general izability.  Academy  of  Management  Journal . 
25,  598-606. 

Pulakos,  E.  D.,  &  Wexley,  K.  N.  (1983).  The  relationship  among 
perceptual  similarity,  sex,  and  performance  ratings  in 
manager-subordinate  dyads,  ftcademy  of  Management  Journal . 

25,  129-139. 

Schmitt, _N.,  &  Hill,  T.  (1977).  Sex  and  race  composition  of 
assessment  center  groups  as  a  determinant  of  peer  and 
assessor  ratings.  Journal  of  Applied  Psychology.  52.  261- 
264. 

Schmitt,  N.,  Noe,  R.  A.,  &  Gottschalk,  R.  (1986).  Using  the  lens 
model  to  magnify  raters*  consistency,  matching,  and  shared 
bias.  Academy  of  Management  Journal .  22.  130-139. 

Thompson,  0.  E.,  &  Thompson,  T.  A.  (1985).  Task-based 

performance  appraisal  for  blue-collar  jobs:  Eyaluation  of 
race  and  sex  effects.  Journal  of  Applied  Psychology.  22. 
747-753. 

Hendelken,  D.  J.,  &  Inn,  A.  (1981).  Nonperformance  influences  on 
performance  eyaluations:  A  laboratory  phenomenon?  Journal  of 
Applied  Psychology.  55,  149-158. 


385 


1 


Zedeck,  S.,  &  Kafry,  D.  (1977).  Capturing  rater  policies  for 
processing  evaluation  data.  Organizational  Behavior  and 
Human  Performance.  Ig,  269-294. 


386 


Table  1 


Breakdown  of  the  Rater/Ratee  Pairs  bv  Race  and  Sex  Composition 


Rater  Race/Ratee  Race  N 

Rater  Sex/Ratee  Sex 

N 

Black/Black 

4,700 

Male/Female 

2,437 

B1 ack/WhIte 

7,391 

Nal e/Male 

20,773 

B1 ack/HlspanIc 

502 

Female/Female 

1,273 

Hispanic/Bl ack 

617 

Femal e/Mal e 

1,686 

HIspanIc/WhIte 

1,365 

Hispanic/Hispanic 

115 

White/Black 

5,745 

White/Uhite 

18,294 

Uhlte/Hlspanic 

808 

Total 

39,537 

26,169 

Note.  The  total  number  of  dyads  for  the  different  sex  ccNnblnatlons  Is 

smaller  than  the  total  number  of  dyads  for  the  race 
combinations  because  five  of  the  19  MOS  were  combat  jobs  In 
which  there  were  no  females.  While  the  percentages  of  the 
total  sample  represented  by  different  race  and  sex  combinations 
are  variable,  they  nevertheless  accurately  represent  the 
corresponding  percentages  found  In  the  Army  population. 


387 


Table  2 


Army-Wide  Factors 


Name 

Definition 

Technical  Skill  and 

Exerting  effort  and  showing  proficiency 

Job  Effort 

over  the  full  range  of  Job  tasks:  engaging  in 

training  or  other  developmental  activities  to 

increase  proficiency;  persevering  under 

dangerous  or  adverse  conditions;  and 

demonstrating  leadership  and  support  towards 

- 

peers . 

Personal  Discipline 

Adhering  to  Army  rules  and  regulations; 

exercising  self-control;  demonstrating 

Integrity  in  day-to-day  behavior;  and,  not 

causing  disciplinary  problems. 

Military  Bearing 

Maintaining  an  appropriate  military  appearance 

and  bearing  and  staying  in  good  physical 

condition. 

388 


Table  3 

Analysis  of  Variance  Results  for  Race 


Technical  Skill  Personal 

Military 

and  Job  Effort  Discipline 

Bearing 

Nature  of  Nature  of 

Nature  of 

Effect 

f  Main  Effects  f  Main  Effects 

f  Main  Effects 

Rating  Source  (A) 

1 

92.61* 

P>S 

19.54* 

P>S 

7.75* 

S>P 

Rater  Race  (B) 

2 

31.26* 

B,H>W 

8.45* 

6,H>U 

3.32* 

B>W 

Ratee  Race  (C) 

2 

47.27* 

H>W>B 

15.24* 

H,W>B 

209.94* 

B,H>U 

A  X  B 

2 

5.17* 

8.49* 

2.59* 

A  X  C 

2 

1.69 

1.99 

1.25 

B  X  C 

4 

28.99* 

3.87* 

6.69* 

A  X  B  X  C 

4 

3.10* 

1.80 

2.67* 

Note.  *q<.05.  Regarding  the  Rating  Source  main  effects:  P  -  peer  raters  and 
S  -  supervisor  raters.  Regarding  the  Rater  and  Ratee  Race  main  effects: 
B  -  black,  W  -  white,  H  -  Hispanic. 


389 


Table  4 

Means  and  Standard  Deviations  for  Rater  Group  Bv  Race  Combinations 


Technical  Skill 

and  Job  Effort 

Peer  Supervisor 

Personal  Discipline 

Peer  Supervisor 

Military  Bearing 

Peer  Supervisor 

White  Rater 

White  Ratee 

4.47 

4.29 

4.53 

4.54 

4.70 

4.77 

(1.09) 

(1.18) 

(1.21) 

(1.28) 

(1.20) 

(1.24) 

Black  Ratee 

4.20 

3.95 

4.40 

4.42 

5.07 

5.15 

(1.14) 

(1.20) 

(1.26) 

(1.32) 

(1.13) 

(1.17) 

Hispanic  Ratee 

4.39 

4.24 

4.58 

4.54 

4.93 

5.06 

(1.03) 

(1.12) 

(1.17) 

(1.25) 

(1.17) 

(1.17) 

BlttCk  Rater" 

White  Ratee 

4.44 

4.33 

4.56 

4.56 

4.60 

4.79 

(1.07) 

(1.17) 

(1.22) 

(1.32) 

(1.24) 

(1.30) 

Black  Ratee 

4.53 

4.19 

4.59 

4.49 

5.16 

5.25 

(1.06) 

(1.16) 

(1.20) 

(1.31) 

(1.14) 

(1.17) 

Hispanic  Ratee 

4.66 

4.44 

4.76 

4.68 

4.98 

5.26 

(0.94) 

(1.11) 

(1.21) 

(1.27) 

(1.13) 

(1.25) 

Hispanic  Rater 

White  Ratee 

4.63 

4.18 

4.72 

4.44 

4.76 

4.62 

(1.04) 

(1.23) 

(1.19) 

(1.36) 

(1.19) 

(1.33) 

Black  Ratee 

4.39 

4.02 

4.52 

4.34 

5.03 

5.16 

(1.07) 

(1.22) 

(1.26) 

(1.44) 

(1.17) 

(1.21) 

Hispanic  Ratee 

4.99 

4.41 

5.13 

4.42 

5.28 

5.27 

(0.83) 

(1.03) 

(1.01) 

(1.33) 

(0.89) 

(1.09) 

390 


Table  5 

Analysis  of  Variance  Results  for  Sex 


Effect 

df 

Technical  Skill 

and  Job  Effort 

Nature  of 

£  Main  Effects 

Personal 

Discipline 

Nature  of 

f  Main  Effects 

Military 

Bearing 

Nature  of 

£  Main  Effects 

Rating  Source  (A) 

1 

53.68* 

P>S 

2.24 

25.21*  S>P 

Rater  Sex  (B) 

1 

4.30* 

F>M 

3.54 

20.74*  F>M 

Ratee  Sex  (C) 

1 

14.70* 

M>F 

.08 

23.20*  M>F 

A  X  B 

1 

2.04 

2.20 

.43 

A  X  C 

1 

.29 

.12 

.96 

B  X  C 

1 

3.14 

16.43* 

8.79* 

A  X  B  X  C 

1 

10.03* 

3.79 

.36 

Note.  *b<.05.  Regarding  the  Rating  Source  main  effects:  P  •  peer  raters  and 
S  -  supervisor  raters.  Regarding  the  Rater  and  Ratee  Sex  main  effects: 
M  «  male  and  F  -  female. 


391 


Table  6 

Means  and  Standard  Deviations  for  Rater  Group  Bv  Sex  Coirbinatlons 


Technical  Skill  Personal  Discipline  Military  Bearing 
and  Job  Effort 


Peer 

Supervisor 

Peer 

Supervisor 

Peer 

Supervisor 

Female  Rater 

Male  Ratee 

4.54 

4.36 

4.62 

4.60 

5.00 

5.14 

(1.05) 

(1.16) 

(1.19) 

(1.32) 

(1.15) 

(1.20) 

Female  Ratee 

4.46 

4.14 

4.56 

4.40 

4.76 

4.94 

- 

(1.02) 

(1.12) 

(1.22) 

(1.26) 

(1.20) 

(1.28) 

Male  Rater 

Male  Ratee 

4.47 

4.22 

4.56 

4.53 

4.81 

4.90 

(1.08) 

(1.19) 

(1.21) 

(1.31) 

(1.21) 

(1.26) 

Female  Ratee 

4.32 

4.25 

4.65 

4.68 

4.72 

4.88 

(1.10) 

(1.18) 

(1.19) 

(1.35) 

(1.21) 

(1.29) 

392 


DESIGNING.  PLANNING.  AND  SELLING  PROJECT  A 


Joyce  L.  Shields 
Lawrence  M.  Hanser 

U.S.  Army  Research  Institute 


Presented  at  the  Annual  Conference  of  the 
~  Society  for  Industrial  and  Organizational  Psychology 

Atlanta,  Georgia 

April  1987 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


393 


Like  many  events,  Project  A  was  a  product  of  the  people  and 
time  of  its  conception.  In  this  paper  we  first  describe  the 
Zeitgeist  which  existed  prior  to  and  during  the  planning  of  this 
project.  Second,  we  discuss  the  design  of  the  project  as  a  prime 
example  of  successful  policy  research.  Finally,  we  address  the 
issue  of  selling  the  project  initially,  and  maintaining  support 
for  long-term  research  in  the  face  of  changing  problems  and 
goals. 

The  Zeitgeist 

The  events  which  shaped  the  Army  and  eventually  resulted  in 
Project  A  began  several  years  earlier,  in  the  1970' s.  More  than 
14.9  million  American  youth  were  drafted  between  1940  and  1973 
(Nelson,  1983).  At  the  close  of  the  Vietnam  War  in  1973  the 
draft  came  to  an  end  and  the  All-Volunteer  Force  (AVF)  was  born. 
By  1975  first  term  attrition  had  reached  26.6%  among  high  school 
graduate  enlistees  and  51.4%  among  non-high  school  graduate 
enlistees,  both  record  highs.  Also  in  that  year,  only  58%  of 
Army  enlistees  had  a  high  school  diploma,  compared  with  90%  this 
year.  Although  the  size  of  the  Army  had  been  reduced  drastically 
from  the  Vietnam  War  era,  these  high  attrition  rates  placed  an 
enormous  burden  on  recruiting.  These  times  were  best  summarized 
in  General  Meyer's  now  famous  White  Paper  (1980)  on  the  'Hollow 
Army. ' 

In  addition  to  changes  in  the  personnel  system  of  the  Army, 
the  Army  was  beginning  the  largest  force  modernization  program 
since  World  War  II.  Anti-tank  weapons  were  now  becoming  wire- 
guided  missiles;  tanks  would  have  on-board  computer  systems  for 
gunnery  and  navigation;  infantry  squads  would  use  satellite 
communications  for  determining  their  battlefield  location;  and 
shoulder  fired  missiles  would  include  state-of-the-art 
electronics  for  aircraft  identification.  Further  complicating 
the  increasing  technical  demands  of  modern  equipment  was  the 
prediction  of  a  significant  decline  in  the  number  of  eligible 
youth  which  was  projected  to  begin  about  1982  and  continue 
through  1996.  Obviously  the  personnel  needs  of  the  Army  were 
facing  substantial  change  in  a  climate  of  declining  manpower 
supply. 

The  climate  was  also  unfavorable  to  testing.  The  nation  as 
a  whole  was  questioning  the  fairness  of  tests.  In  1978  the 
Uniform  Guidelines  were  published.  The  Congress,  in  1981,  issued 
a  directive  that  the  Services  must  "develop  a  better  database  on 
the  relationship  between  factors  such  as  high  school  graduation, 
entrance  test  scores,  age,  etc.,  and  potential  for  effective 
service."  Interest  in,  and  support  for  testing  research  in  the 
Army  had  declined  substantially.  The  Army  Research  Institute, 
the  traditional  home  for  selection  and  classification  research  in 
the  Army,  was  organized  into  two  laboratories  at  that  time,  the 
Training  Research  Laboratory  (TRL)  and  the  Organization  and 
Systems  Research  Laboratory  (OSRL).  OSRL  included  only  a  small 
team  of  people  devoted  to  selection  and  classification  research. 


394 


In  1980,  the  Armed  Services  Vocational  Aptitude  Battery 
Forms  6/7  (ASVAB  6/7)  which  was  used  operationally  from  1976  to 
1980  was  discovered  to  have  been  misnormed.  As  a  result  of  the 
misnorming,  in  1980,  50%  of  Nonprior  Service  Army  Recruits  were 
drawn  from  the  bottom  30%  of  the  eligible  youth  population. 

Today,  over  60%  of  recruits  come  from  the  top  50%  of  the  youth 
population.  With  this  large  influx  of  low-scoring  recruits  in 
the  late  70 's  the  Army  began  to  question  what  difference  entry 
test  scores  made  in  terms  of  eventual  performance  in  military 
occupations.  That  is,  did  it  really  matter  whether  the  Army 
recruited  individuals  from  a  higher  percentile  in  the  youth 
population?  Unfortunately,  this  question  could  not  be  adequately 
addressed,  because  at  the  inception  of  the  AVF,  the  Training  and 
Doctrine  Command  (TRADOC)  introduced  criterion-referenced 
training,  go/no  go  testing,  and  mastery  learning,  so  that  no 
reasonable  criteria  existed.  Further,  Skill  Qualification  Test 
scores  (i.e.,  mid-career  tests  of  job  knowledge)  were  not  readily 
accessible,  and  often,  centralized  recordkeeping  systems,  where 
such  information  was  stored,  were  cross-sectional  rather  than 
longitudinal  in  nature. 

As  is  now  clear,  the  Army  was  facing  several  problems: 

Was  it  possible  to  demonstrate  a  relationship  between 
selection  tests  and  performance  in  military 
occupations? 

Could  selection  tests  be  used  to  identify  individuals 
more  likely  to  complete  their  tour  of  service? 

Given  the  declining  manpower  pool,  could  tests  be 
designed  to  more  efficiently  use  the  available 
resources? 

Could  individuals  be  better  allocated  to  the  diverse 
demands  of  the  Army  and  Army  occupations? 

Could  weapon  systems  be  designed,  with  enhanced  battlefield 
effectiveness,  which  would  match  the  available  skills  of  the 
declining  pool  of  operators  and  maintainers? 

These  problems  cut  across  a  number  of  Army  commands  and 
organizations,  so  that  resolving  them  was  important  to  a  wide 
variety  of  senior  Army  leaders.  The  project  did  not  spring  from 
a  desire  to  examine  the  issues  related  to  validity 
generalization,  or  rater  accuracy,  or  computerized  testing,  or  a 
basic  desire  to  support  industrial/organizational  research. 

Rather  it  grew  from  the  need  to  address  some  very  real  policy 
issues. 


The  People 


395 


According  to  an  Array  Science  Board  report  by  Alexander 
(1980),  "It  is  not  enough  for  a  research  comnxinity  to  exist,  or 
even  for  it  to  be  working  on  problems  of  concern  to  the 
policymakers.  Strong  and  intimate  links  are  essential  to 
transmit  problems  and  questions,  to  convert  them  into 
researchable  projects,  and  to  transmit  the  results  back  to  the 
client  —  not  as  research  reports  —  but  as  options, 
alternatives,  and  evaluations  that  the  policymaker  can  use...  a 
special  type  of  researcher  is  required  —  one  who  understands 
both  the  research  and  the  policy  worlds...  (there  is  also)  a 
requirement  for  people  on  the  Army  side  who  are  sensitive  to  the 
analytical  approach  and  to  the  potential  contributions  of 
research  to  policymaking." 

In  1980,  this  situation  existed.  Although  a  number  of  such 
people  were  in  key  positions  at  that  time,  we  would  be  remiss 
were  we  not  to  mention  the  presence  and  support  of  General 
Maxwell  R.  Thurman.  In  his  roles  as  Commander  of  the  U.S.  Army 
Recruiting  Command,  Deputy  Chief  of  Staff  for  Personnel,  Vice 
Chief  of  Staff  of  the  Army,  and  now  Commander  of  the  Training  and 
Doctrine  Command,  General  Thurman  continued  to  be  actively 
involved  in  this  research. 

The  Plan 

Upon  examining  the  list  of  problems  facing  the  Army  in  the 
late  1970' Sj  it  is  clear  that  a  number  of  discrete  policy 
research  projects  could  have  been  designed  to  address  them.  In 
fact,  the  tendency  is  strong  for  that  to  happen.  However,  rather 
than  simply  forging  ad  hoc  solutions  to  the  laundry  list  of 
problems,  a  comprehensive  program  of  personnel  selection  research 
was  established.  But  it  was  designed  in  such  a  way  as  to  provide 
the  necessary  basic  data  on  which  to  both  resolve  ad  hoc  problems 
as  well  as  to  address  longer  term  scientific  issues. 

In  1981,  ARI  initiated  a  multiyear,  multimillion  dollar 
research  program,  consisting  of  two  interrelated  projects,  to 
relate  better  selection  and  classification  measures  and 
procedures  to  the  criterion  of  soldier  performance.  The 
objectives  of  this  program  were  to; 

Validate  ASVAB  against  soldier  performance 

Develop  new  selection  and  classification  measures  and 
procedures  to  optimize  the  soldier  requirements  match 

Design  computer-based  decision  aides  for  managers  of  the 
Army's  manpower  processes 

At  the  same  time,  ARI  organized  a  Manpower  and  Personnel  Research 
Laboratory  (MPRL),  which  included  the  Personnel  Utilization 
Technical  Area,  to  be  responsible  for  this  program  of  research. 

In  the  Spring  of  1981,  two  teams  of  individuals  from  this 
technical  area  began  to  prepare  the  statements  of  work  which  were 


396 


to  become  Project  A  (development  and  validation  of  enlistment 
measures)  and  Project  B  (development  of  a  comouter-based  system 
to  link  personnel  requirements  with  resources),  addressing  the 
major  objectives  outlined  above.  After  several  months  of  writing 
and  rewriting,  the  Requests  for  Proposals  (RFP)  were  released  in 
the  fall  of  1981.  A  contract  for  Project  A  was  signed  with  the 
Human  Resources  Research  Organization,  American  Institutes  for 
Research,  and  Personnel  Decisions  Research  Institute  in  September 
1982. 

Project  A  as  Policy  Research 

According  to  Alexander  (1980),  successful  policy  research 
has  the  following  characteristics: 

Importance:  The  research  should  be  concerned  with  important 
issues. 

Crosscutting:  Topics  chosen  for  analysis  should  crosscut 
organizational  boundaries. 

Understanding  the  environment:  Researchers  need  to 
understand  the  decision  environment  and  bureaucratic  context 
of  the  policymaker. 

Confidence  and  trust:  Policymakers  must  have  confidence  in 
the  technical  ability  of  the  researchers  and  the  researchers 
should  view  their  clients  as  people  who  value  their  efforts. 

Accountability:  Researchers  must  be  accountable  for  their 
products  and  their  results.  The  research  should  be 
available  to  others  for  inspection,  review,  and  debate. 

Tolerance  for  wrong  answers:  The  probability  of  "wrong", 
ambiguous,  and  complicated  results  must  be  recognized  and 
accepted  by  clients. 


The  designing  and  planning  of  Project  A  is  related  to  the 
chararcteristics  described  above  in  the  following  ways: 

Importance.  As  discussed  previously  the  problems  addressed 
in  this  research  program  are  of  great  importance  to  the  Army. 

Crosscutting.  These  problems  are  of  interest  to  many 
constituencies,  including  personnel  and  training  proponents. 

Both  the  military  and  civilian  sides  of  the  Army  are  equally 
interested  in  the  results,  although  for  different  reasons. 

Understanding  the  Environment.  The  researchers  understand 
the  Army  and  are  given/allowed  access  to  top  policy  makers  in  the 
Army.  There  are  many  different  points  of  view,  and  researchers 
are  given  the  opportunity  to  question  strongly  held  positions. 

The  researchers  are  problem  oriented  and  responsive,  willing  to 


397 


work  quickly  when  possible  to  provide  short  term  answers  In 
exchange  for  a  long  term  committment  on  the  part  of  policymakers. 
Researchers  continue  to  provide  Information  back  to  policymakers 
In  terras  of  options,  alternatives,  and  evaluations  -  not  just 
research  reports. 

Confidence  and  trust.  Key  policymakers  have  confidence  and 
trust  In  the  technical  ability  of  the  researchers  and  their 
understanding  of  the  problems.  Key  policymakers  Invested,  and 
continue  to  invest  In  the  researchers,  and  provide  time,  access 
to  sensitive  data,  and  all  necessary  support  as  well  as  trust  and 
confidence. 

Accountability.  The  research  plan  was  well  founded  on  a 
sound  scientific  base.  It  was  and  continues  to  be  open  for 
inspection  by  both  the  scientific  and  policy  communities. 

Tolerance  for  wrong  answers.  The  Army  clients  are  not  only 
open  to  results  -  whether  or  not  prior  beliefs  are  confirmed,  but 
they  have  been  and  continue  to  be  willing  to  use  the  results  to 
change  and  set  policy. 


The  Changing  Environment 

As  researchers,  we  have  a  tendency  to  judge  the  success  of  a 
project  by  how  well  we  have  solved  the  problems  which  originally 
generated  It.  The  list  of  personnel  selection  and  classification 
problems  which  the  Army  faces  today  would  be  somewhat  different 
from  the  list  we  mentioned  earlier.  Policymakers  are  not 
interested  in  solutions  to  problems  which  no  longer  exist,  but 
rather  In  the  problems  which  they  face  today.  The  challenge  for 
Project  A,  and  all  such  long-term  projects,  is  to  continually 
readjust  as  policy  problems  change,  so  that  the  research  remains 
relevant  to  policymakers. 


References 


Alexander,  Arthur  (1980).  Policy  Research  on  Human  Issues.  Washington, 

D.C.:  Army  Science  Board,  Human  Issues  Group  #2. 

Nelson,  Gary  R.  (1983).  The  SuddIv  and  Quality  of  First-Term  Enlistees  Under 
the  All-Volunteer  Force.  In  W.  Bowman,  R.  Little,  and  G.  T.  Sicilia 
(Eds.).  The  All-Volunteer  Force  After  a  Decade.  Washington,  D.C.: 
Pergammon-Brassey's  International  Defense  Publishers. 

Meyer,  Edward  C.  (1980).  The  Hollow  Army  (White  Paper).  Washington,  D.C.: 
Department  of  the  Army. 


398 


PREDICTIVE  VALIDITY  OF  N0NC06NITIVE  MEASURES  FOR 
ARMY  CUSSIFICATION  AND  AHRITION 


Hilda  Ming 

U.S.  Amy  Research  Institute 

Leaetta  M.  Hough 
Norman  6.  Peterson 

Personnel  Decisions  Research  Institute 


Presented  at  the  Annual  Conference  of  the 
Society  for  Industrial  and  Organizational  Psychology 

Atlanta,  Georgia 

April  1987 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Amy 
Research  Institute  or  the  Department  of  the  Army. 


399 


Predictive  Validity  of  Noncognitive  Measures 
for  Amy  Classification  and  Attrition 


Abstract 

Over  9,000  soldiers  in  four  military  occupational  specialties  were 
administered  vocational  interest  measures,  biographical  questionnaires,  and 
temperament  surveys  as  they  entered  their  service  careers.  Approximately 
nine  months  later,  follow  up  research  detemined  whether  these  soldiers  were 
still  in  their  initial  occupations  or  even  still  in  the  Amy.  Selected 
vocational  interest  measures  predicted  occupational  classification  fairly 
well  for  two  of  the  four  occupations;  selected  biodata  and  temperament 
measures  predicted  early  attrition  fairly  well  given  the  low  base  rate  of 
this  dependent  variable. 


Predictive  Validity  of  Noncognitive  Measures 
for  Amy  Classification  and  Attrition 

Many  (for  ex^le.  Campbell .  1986)  have  argued  that  performance  is 
inherently  multidimensional.  One  way  of  conceptualizing  performance  space 
is  to  divide  it  into  "can-do”  and  ”wi 11-do"  subspaces.  The  former  might 
consist  of  the  technical  skills  and  abilities  indexing  the  maximum  quality 
and  Quantity  of  productivity  of  which  an  individual  is  capable.  The  latter 
would  then  be  conq)osed  of  those  attitudes  and  characteristics  indexing  the 
typical  performance  level  of  the  individual.  Cognitive  abilities  predict 
the  former  performance  subspaces  fairly  well;  noncognitive  measures  such  as 
vocational  interests,  biodata,  and  temperament  measures  provide  some  promise 
for  predicting  the  latter.  It  is  these  typical  performance  measures  which 
are  of  concern  in  this  report. 

As  part  of  the  Amy's  Project  A.  a  Preliminary  Battery  of  paper  and 
pencil  measures  was  administered  to  soldiers  in  four  selected  Military 
Occupational  Specialties  (MOS)  as  they  entered  military  service  during  late 
1983  and  early  1984.  The  battery  included  measures  of  vocational  interests, 
individual  history  background  or  biodata,  and  teiq)erament.  These  measures 
were  used  to  predict  whether  a  soldier  would  still  be  in  the  Amy  some  time 
after  initial  training,  in  this  case,  December,  1984.  The  average  length  of 
time  a  soldier  had  been  in  the  service  was  nine  months,  with  a  range  from 
six  to  eighteen  months. 

The  hypotheses  of  interest  concerned  classification  and  prediction. 
Hypothesis  One  concerned  the  efficacy  of  vocational  interest  measures  in 
predicting  MOS  membership:  Members  of  the  four  very  different  MOS  should 
show  different  average  scores  on  the  vocational  interest  measures.  Such  a 
finding  would  provide  support  for  the  "gravitational  hypothesis"  (McComick, 
Jeanneret,  &  Mecham,  1972)  which  suggests  that  people  of  different  interests 
"gravitate"  towards  those  occupations  which  they  find  most  compatible. 
Hypothesis  Two  concerned  the  efficacy  of  all  the  noncognitive  measures  in 
predicting  early  attrition,  whether  the  tested  soldier  was  still  in  the 
service  by  December,  1984,  when  the  records  were  evaluated.  This  hypothesis 
has  two  components,  one  for  the  biodata  and  temperament  measures  and  one  for 
the  vocational  interests.  For  the  former,  prior  research  (Hough,  Ounnette, 
Wing,  Houston,  &  Peterson,  1984)  has  shown  biodata  and  temperament  measures 
to  cover  the  same  constructs  or  variables.  In  this  instance,  the  concern 
was  with  the  attitudinal  and  socialization  variables  which  might  predict 
whether  an  individual  would  be  ill-behaved,  hence  a  potential  discipline 
problem,  or  would  have  academic  difficulty  with  Army  training.  For  the 
latter,  corroborating  evidence  would  consist  of  attritees  having  less 
compatible  interests  for  a  given  MOS  than  the  stayers.  It  is  the  case  that 
cognitive  variables  are  effective  in  predicting  training  criteria;  the 
questions  here  concerned  whether  the  noncognitive  variables  could  provide 
predictability  in  addition  to  the  cognitive  variables  currently  used  in  Army 
selection  and  classification. 


401 


Method 


Research  Participants 

The  population  fron  which  these  exaninees  were  selected  consisted  of 
those  soldiers  (recruits)  who  had  entered  active  duty  in  the  Regular  Army 
and  who  had  begun  training  in  one  of  four  MOS  at  one  of  five  selected  Army 
posts  between  October  1,  1983,  and  June  30,  1984,  as  follows: 

NOS  19A:  Tank  Crewman.  The  sample  consisted  of  2,614  male  soldiers. 

MOS  31C:  Radio  Teletype  Operator.  The  sample  consisted  of  1,989 
soldiers,  which  included  280  females. 

MOS  63B:  Vehicle  and  Generator  Mechanic.  The  sample  included 
2,197  soldiers  of  whom  129  were  female. 

MOS  71L:  Administrative  Specialist.  The  sample  included  2,798 
soldiers  of  whom  1,350  were  female. 

The  groups  were  ethnically  diverse,  each  having  about  five  percent 
Hispanics  and  over  twenty  percent  Blacks,  except  for  the  71L's,  of  which 
over  forty  percent  were  Black.  No  analyses  were  performed  separately  by 
race/ethnicity  or  by  sex. 

Variables 


Predictors 

ASVAB.  Before  entry  into  military  service,  each  soldier  had  taken  the 
Armed  Services  Vocational  Aptitude  Battery  (ASVAB),  a  3  1/2  hour  cognitive 
test  battery  used  for  selection  and  classification  into  all  the  military 
services  All  recruits  had  to  achieve  a  minimum  score  on  a  composite  known 
as  the  Armed  Forces  Qualification  Test  (AFQT),  summed  from  scores  on  four 
subtests.  High  school  graduates  had  to  be  at  or  above  the  21st  percentile, 
while  nongraduates  had  to  be  at  or  above  the  31st  percentile,  based  on  World 
War  11  norms  for  male  military  personnel.  Second,  each  MOS  had  a  specific 
composite  on  which  a  minimal  score  was  required  for  entry.  These  minima 
were  roughly  equivalent  (McLaughlin,  Rossmeissl,  Wise,  Brandt,  &  Wang,  1984) 
to  the  26th  percentile  of  the  AFQT  for  Armor  Crewman  and  Mechanic,  and  to 
the  39th  percentile  for  Radio  Teletype  Operator  and  Administrative 
Specialist. 

Prel imi nary  Battery  (PB).  The  PB  required  about  four  hours  to 
administer.  It  included  eight  spatial /perceptual  measures  which  will  not  be 
discussed  further  here.  Also  included  were  the  18  scales  from  the  Air  Force 
Vocational  Interest  Career  Examination  (VOICE;  Alley  &  Matthews,  1982);  five 
temperament  scales  adapted  from  published  scales  [two  from  the  Differential 
Personality  Questionnaire  or  OPQ  (Tellegen,  1982);  one  from  the  California 


402 


Psychological  Inventory  or  CPI  (Gough,  1975);  the  Rotter  I/E  Scale  (Rotter, 
1966),  and  validity  scales  from  both  the  DPQ  and  the  Personality  Research 
Form  or  PRF  (Jackson,  1967)1;  and  (Vens*  (OMens  &  Schoenfeldt,  1979) 


Biographical  Questionnaire 
prior  analyses  of  an  initia 


BQ).  The  BQ  was  scored  for  22  scales  based  on 
sample  (Hough  et  a1.,  1984).  Items  tapping 
religion  or  socioeconomic  status  had  been  deleted  while  items  tapping 
curricula,  coursework,  and  physical  fitness  had  been  added.  These  prior 
analyses  had  determined  the  structure  of  these  sex- independent  scales. 


The  names  of  the  scales  used  in  the  analysis  reported  here,  with  their 
numbers  of  items,  are  as  follows.  The  range  and  median  values  of 
coefficient  alpha  for  each  set  are  also  given.  Each  measure  was 
administered  untimed. 


Vocational  Interest  Career  Examination  (VOICE).  This  typically  takes 
15-20  minutes.  The  scales  include  Office  Administration  (OAD:  20  items); 
Heavy  Construction  (HC:  20  items);  Electronics  (ELE:  20  items);  Medical 
Service  (MED:  20  items);  Science  (SCI:  20  items);  Outdoors  (OUT:  15 
items);  Aesthetics  (AES:  15  items);  Mechanics  (MEC:  15  items);  Food 
Service  (FS:  15  items);  Law  Enforcement  (LAW:  15  items);  Agriculture  (AG: 
15  items);  Mathematics  (MTH:  12  items);  Audiographics  (AUD:  10  items); 
Teacher/  Counseling  (TEA:  10  items);  Marksman  (MRK:  10  items);  Drafting 
(OFT;  7  items);  Craftsman  (CFT:  7  items);  Automated  Data  Processing  (ADP: 
7  items).  Coefficient  alphas  ranged  from  0.75  to  0.96  with  a  median  of 
0.89. 


It  was' hypothesized  that  the  following  scales  would  be  most  useful  for 
the  selected  MOS:  HC,  WK,  ELE,  OAD,  and  MEC.  MEC  would  be  the  scale  for 
the  Mechanics,  OAD  would  be  the  scale  for  the  Administrative  Specialists,  HC 
and  MRK  would  be  scales  useful  for  the  Tank  Crewman,  and  OAD  and  ELE  would 
be  useful  scales  for  Radio  Teletype  Operator.  Two  additional  scales,  ADP 
and  MTH,  might  also  be  useful  in  distinguishing  the  Administrative 
Specialists  from  the  Radio  Teletype  Operators. 

Personal  Opinion  Inventory  (POI).  This  typically  takes  20-25  minutes. 
The  scales  included  Conscientiousness  (CON:  10  items,  from  DPQ  Unlikely 
Virtues  and  PRF  Infrequency);  Social  Potency  (SP:  27  items,  DPQ  Social 
Potency);  Stress  Reaction  (SR:  36  items,  DPQ  Stress  Reaction); 

Socialization  (SOC:  30  items,  from  CPI  Socialization);  Rule  Abiding  (RA:  9 
items,  from  CPI  Socialization);  Family  Closeness  (FC:  7  items,  from  DPQ 
Stress  Reaction);  Effort  vs.  Luck  (LCK:  16  items,  from  Rotter  I/E  Scale); 
Internal  Locus  of  Control  (LOC:  29  items,  from  Rotter  I/E  Scale).  The 
coefficient  alpha  for  CON  was  0.44;  excluding  this  validity  scale,  the  range 
for  coefficients  alpha  was  0.55  to  0.90  with  a  median  of  0.60. 

Owens'  Biographical  Questionnaire  (BQ).  This  typically  takes  20-25 
minutev! The  scales  included  Academic  Achievement  (AA:  8  items); 

Adjustment  (ADJ:  12  items);  Athletic  Interests  (ATH:  2  items);  Cultural- 
Literary  (CL:  3  items);  Independence  (INO:  8  items);  Intellectual  ism  (INT: 
3  items);  Leadership  (LEAD:  12  items);  Physical  Activity/Condition  (PA:  15 
items);  Positive  Academic  Attitude  (PAA:  7  items);  Parental  Control  (PC: 

11  items);  Parental  Closeness  (PCLO:  15  items);  Sociability-Popularity 


403 


(POP:  9  Itens);  Sibling  Hamony  (SIBH:  5  Itens);  Scientific  Orientation 
(SO:  12  Itens).  Other  variables  In  the  BQ  requested  Infomatlon  about 
acadenics  and  course  vrark.  Coefficient  alphas  ranged  fron  0.49  to  0.88  with 
a  nedlan  of  0.7S. 


- jranhlcs.  The  additional  variable,  of  whether  a  soldier  had 

graduated  from  high  school,  was  available  from  Amy  files. 


It  was  hypothesized  that  the  following  temperament  and  biodata  scales 
would  Index  the  motivational  and  attitudinal  variables  that  could  predict 
attrition  for  cause:  RA,  SOC,  and  SR  from  the  POI  and  AA  from  the  BQ. 
Because  high  school  diploma  status  had  proven  itself  as  a  predictor  of  early 
attrition  in  much  prior  research,  it  also  was  included,  as  was  the  AFQT 
(cognitive)  score. 


Criteria 


Classification.  This  variable  was  the  nominal  one  of  which  MOS  the 
soldier  had  begun  his/her  military  service  and  was  available  at  the  time 
attrition  data  were  collected. 


Attrition.  For  each  soldier  In  the  sample,  file  data  were  available  as 
of  December  31,  1984,  indicating  whether  the  soldier  was  still  enlisted  or, 
If  not,  how  the  soldier  had  been  discharged.  The  file  data  are 
administrative  codes  Indicating  the  recorded  reasons  why  the  attrition  has 
occurred.  Three  categories  of  attrition  were  developed  for  this  research. 
These  categories,  with  sample  file  codes,  are:  Leave  for  Good  Reason  (to 
attend  service  academy,  medical  discharge,  hardship);  Trainee  Discharge 
Program  or  TDP;  Leave  for  Bad  Reason  (drug  use,  desertion,  serious  crime). 
The  Trainee  Discharge  Program  refers  to  a  set  of  administrative  procedures 
which  pemit  a  comparatively  simple  dismissal  of  soldiers,  typically  within 
the  first  180  days  of  service,  for  "failure  to  adapt"  to  the  Army.  While 
the  behavioral  characteristics  of  this  category  are  imprecise,  It  appears  to 
refer  more  to  motivational  than  academic  weaknesses  which  prohibit  a  soldier 
from  making  a  satisfactory  adjustment  to  Army  life.  The  best  single 
predictor  of  such  early  attrition  for  males  Is  high  school  diploma  status: 
High  school  graduates  are  much  more  likely  to  complete  their  tours  of 
enlistment. 


The  numbers  of  soldiers  In  each  category  were  as  follows: 

13L  (Tank  Crewman):  Hot  Attrit  ■  2,299;  Trainee  Discharge  Program  « 
107;  Bad  Attrit  -  73;  Good  Attrit  »  135. 

(Radio  Teletype  Operator):  Not  Attrit  ■  1,750;  Trainee  Discharge 
Program  ■  141;  Bad  Attrit  ■  51;  Good  Attrit  ■  47. 

(Mechanic):  Not  Attrit  ■  2,066;  Trainee  Discharge  Program  *  33; 

Bad  Attrit  »  61;  Good  Attrit  (or  Missing)  »  37. 

2UL  (Administrative  Specialist):  Not  Attrit  -  2,540;  Trainee  Discharge 
Program  ■  121;  Bad  Attrit  ■  57;  Good  Attrit  (or  Missing)  =  77. 


404 


The  standard  procedure  is  to  remove  the  cases  of  attrition  for  good 
reasons  before  analysis,  which  was  followed  here.  The  attrition  rate  was 
low,  ranging  from  four  percent  for  the  Mechanics  to  ten  percent  for  the 
Radio  Teletype  Operators. 

Analyses 

Data  Editing 


Predictors.  Records  were  initially  checked  for  consistency  of  Social 
Security  Number,  race,  and  sex,  within  person  and  across  inventories.  There 
were  several  data  quality  screens  for  the  instruments  used  here.  Details  of 
the  procedures  can  be  found  in  Hough  et  al.  (1984).  For  the  VOICE  and  the 
BQ,  there  was  a  three-step  process  to  eliminate  records  which  contained  too 
many  missing  data  to  yield  interpretable  scores.  There  were  four  steps  in 
the  process  for  the  POI,  the  extra  step  being  the  employment  of  a  validity 
screen  via  application  of  the  CON  scale.  Two  percent  of  the  VOICE  cases  and 
three  percent  of  the  BQ  cases  were  deleted  because  of  missing  data.  Two 
percent  of  the  POI  cases  were  deleted  for  the  missing  data  rule,  while  five 
percent  were  dropped  because  of  the  CON  screen.  For  item  analysis 
purposes,  as  well  as  subsequent  analyses,  sample  sizes  varied  across  scales 
within  these  inventories  as  well  as  across  them. 

Criteria.  Classification.  The  criterion  here  was  membership  in  one  of 
the  four  MOS,  available  from  the  test  records. 

Criteria.  Attrition.  As  discussed  above,  file  data  provided  attrition 
codes  so  the  editing  problem  was  one  of  matching  the  PB  cases  to  the  file 
cases. 

Descriptive  Statistics 

For  each  of  the  five  substantive  POI  scales,  the  19  BQ  scales,  and  the 
18  VOICE  scales,  coefficient  alphas  were  calculated  and  have  been  reported 
above.  Means  and  standard  deviations  for  each  scale  were  calculated  for  the 
total  sample  and  for  various  subgroups  of  interest  as  formed  by  demographic 
and  dependent  variables  such  as  MOS,  high  school  diploma  status,  and 
attrition  category.  These  descriptive  statistics  will  not  be  discussed 
further  here. 

Inferential  Statistics 

The  uniqueness  (i[2)  of  each  predictor  scale  from  the  ASVAB  was 
calculated.  Uniqueness  is  the  amount  of  reliable  variance  of  a  given 
variable  not  predicted  by  another  variable  or  set  of  variables.  The 
computational  formula  is  il*  »  gxx  -r2,  where  .  uniqueness,  Rjcx  =  the 
reliability  of  the  variable  of  interest,  and  ^  *  the  squared  multiple 
regression  when  the  variable  of  interest  is  regressed  on  some  other  set  of 
variables.  [See  Wise  and  Mitchell  (1985)  for  a  more  extended  treatment  of 
uniqueness.] 


405 


The  ranges  and  median  values  of  the  uniqueness  coefficients  for  the 
variables  considered  here  were  as  follows.  For  the  VOICE,  the  range  was 
0.64  to  0.85,  with  a  median  of  0.77.  For  the  POI,  the  range  was  0.54  to 
0.86,  with  a  median  of  0.60.  For  the  BQ,  the  range  was  0.43  to  0.86  with  a 
median  of  0.70.  Thus,  this  set  of  measures  is  capturing  much  reliable 
variance  which  is  not  being  picked  up  by  the  cognitive  test  battery. 

The  next  step  was  the  computation  of  a  series  of  analyses  of  variance 
with  each  predictor  scale.  The  major  variable  was  attrition  (Not  Attrit, 
Trainee  Discharge  program,  and  Bad  Attrit)  or  classification  (MOS 
membership). 

The  next  analyses  were  in  direct  reference  to  the  hypotheses  stated 
above.  For  Hypothesis  One,  discriminant  function  analysis  was  performed 
using  the  selected  VOICE  scales  as  predictors  and  MOS  memberships  as  the 
criterion.  For  Hypothesis  Two,  multiple  regression  analyses  were  performed, 
using  selected  BQ,  POI,  and  VOICE  scales  to  predict  attrition  status. 

Results  and  Discussion 

Hypothesis  One  stated  that  the  four  MOS  would  differ  in  the  average 
scores  of  selected  interest  scales,  as  evaluated  by  the  VOICE.  Both 
generalized  analyses  of  variance  as  well  as  discriminant  analyses  showed 
this  to  be  the  case.  For  the  analyses  of  variance,  each  of  the  18  VOICE 
scales  as  well  as  the  AFQT  significantly  <  .01)  discriminated  among  the 
four  occupations.  Discriminant  analyses  were  used  with,  first,  five  VOICE 
scales  (HC,~MRK,  ELE,  OAD,  MEC)  and,  second,  with  two  additional  scales 
(ADP,  MTH).  As  displayed  in  Table  1,  both  the  Administrative  Specialists 
and  the  Mechanics  were  fairly  well  predicted  (76%  and  69%  correct 
predictions,  respectively),  somewhat  better  with  the  five  scales  than  with 
the  seven  scales. 

The  Tank  Crewmen  were  less  well  predicted,  although  adding  the  two 
scales  to  the  initial  five  helped  somewhat.  The  Radio  Teletype  Operators 
were  predicted  least  well.  The  seven  scales  did  somewhat  better  than  the 
five,  but  neither  group  of  scales  predicted  membership  in  this  MOS  at  much 
greater  than  a  chance  level.  One  obvious  explanation  for  the  difference  in 
predictability  among  the  four  MOS  lies  in  the  origin  of  the  VOICE 
instrument.  It  was  designed  for  Air  Force  specialties  and  included 
occupational  scales  pertinent  to  them.  Administrative  Specialists  and 
Mechanics  are  common  to  all  military  services,  hence  it  is  not  surprising 
that  these  two  occupations  are  well  predicted.  Tank  Crewman  and  Radio 
Teletype  Operator,  on  the  other  hand,  are  uniquely  Army  occupations  and  did 
not  have  specific  VOICE  scales.  It  should  not  be  surprising  that  these  two 
occupations  were  less  well  predicted. 

For  Hypothesis  Two,  the  prediction  of  attrition,  the  biodata  and 
temperament  scales  will  be  considered  first.  In  the  generalized  analyses  of 
variance,  most  of  the  BQ  scales  and  most  of  the  POI  scales  predicted 
attrition  in  the  anticipated  direction.  The  nonpredicting  scales  had  more 
to  do  with  cooperativeness  types  of  variables  (e.g.,  BQ:  SIBH,  POP;  POI: 

SP)  while  the  predicting  scales  had  more  to  do  with  socialization  and 


406 


achievement  traits,  (e.g.,  BQ:  AA,  LEAD,  ADJ,  INT;  POI:  SR,  RA,  SOC). 

Most  of  these  scales  also  correlated  with  high  school  diploma  status, 
although  not  quite  as  consistently  nor  as  strongly.  This  probably  reflects 
that  many  of  the  variables  contributing  to  a  young  person's  completion  of 
high  school  are  being  evaluated  by  the  biodata  and  tenq>erament  measures  used 
here. 


Multiple  regression  analyses  of  attrition  split  the  criterion  into  two 
classes:  stay  or  leave  (the  latter  being  both  the  Trainee  Discharge  Program 
and  Bad  Attrit).  Keep  in  mind  that  the  overall  rate  of  attrition  is  quite 
low,  at  seven  percent.  This  will  make  prediction  difficult.  As  displayed 
in  Table  2,  the  traditional  predictors  of  attrition,  high  school  diploma 
status  combined  with  AFQT,  were  significantly  but  mildly  related  to 
attrition.  The  combination  of  the  four  hypothesized  biodata-temper ament 
scales  were  also  significantly  related  to  attrition,  with  an  adjusted  R 
almost  twice  the  size  of  that  yielded  by  the  traditional  predictors.  Tother 
biodata-temperament  scales  could  have  been  selected  but  it  is  likely  the 
results  would  have  been  the  same.)  Combining  the  two  classes  of  predictors 
did  not  improve  the  prediction  in  any  noticeable  way  beyond  that  provided  by 
the  biodata-temperament  scales.  The  values  of  the  adjusted  B.'s  are 
relatively  small,  but  recall  that  the  overall  rate  of  attritTon  to  be 
predicted  was  also  small  (only  seven  percent  attrition).  Such  a  severe 
split  on  the  criterion  operates  to  reduce  the  expected  correlation  with 
other  variables.  Further,  the  increase  in  validity  over  the  traditional 
predictor  of  high  school  diploma  status,  as  provided  by  the  four 
noncognitive  scales,  suggests  that  the  latter  may  provide  a  useful  addition 
to  Army  selection  and  classification  procedures. 

Table  3  is  an  expectancy  table  displaying  the  predicted  attrition  rates 
of  soldiers  selected  on  the  composite  of  the  four  noncognitive  scales.  As 
can  be  seen,  using  a  cutoff  standardized  score  of,  say,  25  on  this  attrition 
predictor  would  eliminate  two  percent  of  all  applicants,  but  the  attrition 
rate  in  the  cutoff  group  was  predicted  to  be  28  percent  rather  than  the 
seven  percent  over  all.  A  cutoff  score  of  50,  on  the  other  hand,  would 
eliminate  half  of  all  these  soldiers  who  would  have  had  an  attrition  rate 
only  slightly  higher  (ten  percent)  than  that  of  the  total  group.  While  the 
choice  of  a  specific  cutoff  score  has  many  arbitrary  aspects  to  it,  it  seems 
clear  that  a  low  cutoff  score  based  on  a  biodata-temperament  scale  could 
eliminate  a  small  percentage  of  soldiers  who  could  be  predicted  to  have  an 
attrition  rate  significantly  higher  than  average.  This  result  obviously 
requires  replication,  both  for  this  group  as  they  continue  their  Army 
service,  as  well  as  for  other,  different  groups  of  soldiers. 

The  second  part  of  Hypothesis  Two  concerned  how  well  interest  measures 
might  predict  attrition.  Using  the  same  five  interest  scales  included  in 
Hypothesis  One,  for  classification,  in  addition  to  the  four  biodata- 
temperament  scales,  yielded  adjusted  multiple  R's  ranging  from  0.11  to  0.20 
across  the  four  MOS.  These  are  displayed  in  Table  4.  The  variation  in 
values  directly  mirrors  the  base  rates  of  attrition  in  the  MOS  to  be 
predicted.  The  highest  R  was  in  the  MOS  with  the  highest  rate  of  attrition. 
Radio  Teletype  Operator,  while  the  lowest  was  in  the  MOS  with  the  lowest 
rate  of  attrition.  Mechanic.  Thus,  while  it  might  appear  that  vocational 


407 


Radio  Teletype  Operator,  while  the  lowest  was  in  the  MOS  with  the  lowest 
rate  of  attrition,  Mechanic.  Thus,  while  it  might  appear  that  vocational 
interest  measures  could  also  be  useful  as  predictors  of  early  attrition, 
this  conclusion  must  be  tempered  here  by  the  very  limiting  values  of  the 
differing  and  low  attrition  rates.  That  is  why  no  further  investigation  of 
specific  VOICE  scales  was  undertaken  at  this  point.  It  would  seem  that  a 
vocational  interest  battery  specifically  tailored  to  Army  jobs  (which  is 
part  of  other  Project  A  research)  should  be  used  and  evaluated  for 
discriminative  efficiency  before  such  measures  might  be  used  operationally 
as  predictors  of  attrition.  The  case  for  biodata- temperament  measures  is 
much  stronger. 

In  conclusion,  this  research  has  demonstrated  that  noncognitive 
measures  can  be  effective  predictors  of  two  aspects  of  Army  performance, 
classification  and  attrition.  Classification  was  better  predicted  when  the 
occupational  interest  scale  was  appropriate  to  the  Army  occupation  being 
predicted.  While  the  attrition  regression  coefficients  were  relatively  low 
in  value,  this  was  probably  due,  at  least  in  part,  to  the  low  base  rates  of 
attrition  to  be  predicted.  It  may  be  that  different  analytic  methods,  such 
as  probit  or  logit  analysis  (Aldrich  and  Nelson,  1984)  might  better 
explicate  the  relationship  between  attrition  and  the  noncognitive  measures 
used  in  this  research.  Follow-on  research  with  this  cohort,  as  its  members 
move  through  their  Army  careers,  should  explore  these  methods.  It  is  also 
likely  that  the  rates  of  attrition  will  increase  as  the  cohort  "ages," 
operating  to  improve  the  chances  for  accurate  prediction. 

Acknowledgement 

Thanks  to  Clinton  B.  Walker,  U.S.  Army  Research  Institute,  for 
providing  the  attrition  codes  used  in  this  research. 

References 

Aldrich,  J.  H.,  &  Nelson,  F.  D.  (1984).  Linear  probability,  looit.  and 
probit  models.  Beverly  Hills;  Sage  Publications. 

Alley,  W.  E.,  &  Matthews,  M.  D.  (1982).  The  Vocational  Interest  Career 
Examination.  Journal  of  Psychology.  112.  169-193. 

Campbell,  J.  P.  (1986,  August).  Project  A:  When  the  textbook  goes 
operational .  Invited  address  at  the  94th  Annual  Convention  of  the 
American  Psychological  Association,  Washington,  DC. 

Gough,  H.  6.  (1975).  Manual  for  the  California  Psychological  Inventory. 

Palo  Alto,  CA;  Consulting  Psychologists  Press. 

Hough,  L.  M.,  Ounnette,  M.  D.,  Wing,  H.,  Houston,  J.,  &  Peterson,  N.  G. 
(1984,  August).  Covariance  analyses  of  cognitive  and  noncognitive 
measures  in  Army  recruits:  An  initial  sample  of  Preliminary  Battery 
data.  Paper  presented  at  the  92nd  Annual  Convention  of  the  American 
Psychological  Association,  Toronto,  Ontario,  Canada. 


408 


Jackson,  D.  N.  (1967).  Personality  Research  Form  Manual.  Goshen,  NY; 
Research  Psychologists  Press. 

McCormick,  E.  J.,  Jeanneret,  P.  R.,  &  Mecham,  R.  C.  (1972).  A  study  of  job 
characteristics  as  based  on  the  Position  Analysis  Questionnaire  (PAQ). 
Journal  of  Applied  Psychology  Monographs.  56,  347-368. 

McLaughlin,  D.  H.,  Rossmeissi,  P.  G. ,  Wise,  L.  L.,  Brandt,  D.  A.,  &  Nang,  M. 
(l984).  Validation  of  current  and  alternatiye  Armed  Seryices 
Vocational  Aptitude  Battery  (ASVAB)  area  composites.  (Technical  Report 
651).  Alexandria,  VA;  U.S.  Army  Research  Institute. 

Owens,  W.  A.,  &  Schoenfeldt,  L.  F.  (1979).  Toward  a  classification  of 
persons.  Journal  of  Applied  Psychology  Monographs.  64.  569-607. 

Rotter,  J.  B.  (1966).  Generalized  expectancies  for  internal  yersus 

external  control  of  reinforcement.  Psychological  Monographs.  80.  (1, 
Whole  No.  609). 

Tellegen,  A.  (1982).  Brief  Manual  for  the  Differential  Personality 
Questionnaire.  Unpublished  manuscript,  Uniyersity  of  Minnesota, 
Minneapolis. 

Wise,  L.  L.,  &  Mitchell,  K.  J.  (1985,  August).  Deyelooment  of  an  index 
of  maximum  yalidity  increment  for  new  predictor  measures.  Paper 
presented  at  the  93rd  Annual  Convention  of  the  American  Psychological 
Association,  Los  Angeles,  CA. 


409 


Table  1.  Predicting  Hilitary  Occupation  NeBbecship  with 
Selected  VOICE  Scales 


Predicted  Occupation  (5  Scales)* 


Actual 

Occupation 

Percent  of 
Total  Group 

Tank 

Crewaan 

Radio 

Operator 

Mechanic 

Admin. 

Spec. 

Tank  crewaan 

27 

El 

27 

17 

9 

Radio 

Operator 

21 

14 

m 

7 

10 

Mechanic 

23 

31 

22 

E 

5 

Admin. 

Specialist 

29 

12 

2S 

7 

im 

- 

Predicted  Occupation  (7  Scales)* 

Tank 

Crewaan 

27 

EH 

27 

19 

9 

Radio 

Operator 

21 

14 

13 

8 

10 

Mechanic 

23 

28 

18 

0 

6 

Admin. 

Specialist 

29 

12 

25 

7 

0 

*The  five  acales  uaed  were  Heavy 
Oonatruction*  Marksaan,  Blectronica, 

Office  Adainistration,  and  Mechanic. 

The  seven  scales  were  the  above 

fi^ve  plus  Autoaated  Data  Processing 
and  Hatbeaatics. 

Note:  Entries  in  table  are  percentages. 

Boxed  figures  are  correct  predictions. 
Coluans  sun  to  100  within  rounding  errors; 
rows  do  not. 


410 


Tabic  2.  Predicting  Overall  Attrition  vith  Traditional 
and  Noncognitive  Predictors 


Predictors  Used  in  Prediction  Regression 


Traditional 

APQT 

yes 

No 

Yes 

High  School 
Graduation  ■ 

Yes 

No 

Yes 

Noncognitive 

Rule  Abiding 

No 

Yes 

Yes 

Socialization 

No 

Yes 

Yes 

Stress  Reaction 

No 

Yes 

Yes 

Acadenic 

Achi evenent 

No 

yes 

Yes 

Adjusted  R 

.08 

.15 

.15 

Sample  Size 

8,352 

8,198 

7,480 

411 


Table  3.  Predicted  Percent  Attrition  for  Different  Cut-off  Scores 


Predicted  Attrition 
Score*  (Standardized) 
Cut-off 

Attrition  Rate  in 
Below  Cut-off  Group 

Percent  of 

Total  Group 
in  Group 

Below  Cut-off 

20 

.25 

1 

25 

.28 

2 

30 

.23 

3 

35 

.17 

8 

40 

.15 

16 

45 

.12 

31 

50 

.10 

50 

55 

.08 

69 

<0 

.08 

85 

65 

.07 

95 

70 

!  .07 

99 

75 

.07 

100 

*Coaposed  of  Rule  Abiding,  Socialization, 
Stress  Reaction,  and  Acadeaic  Achieveacnt 
Scales. 


412 


Table  4.  Attrition  Status  By  Military  Occupation 


«  ■  u 

m  «  a 


•  *> 
can 

£  *  9 
e»  o  s 

M 

•  •  -f! 

w  O 

e  • 

»  J<  £ 

SO  s 

A  • 

^  *2 

9  ®  • 

U  <0  f 
u  e  a 
o  «  w 


m  *i  > 
u  •  w 
O  *2  • 

•s  S’" 

A  O  <M 

u  o 
•o 

•  4^  « 

>  a  X 

s 

S  §  8 

m  *1  m 
ace 
a  a 
a  4/  e 

u  a 

§**8 
®  ®  ft 

^  c  S 

MOM 

*»  •-  a 

44  > 

<  a  a 


413 


'**Ad justed  aulttple  correlation  between  binary 
Attrition  Status  and  four  biodata-tenperaaent 
predictor  scales  plus  five  interest  scales. 


IDENTIFYING  OPTIMAL  PREDICTOR  COMPOSITES  AND 
TESTING  FOR  GENERALIZABILITY 
ACROSS  JOBS  AND  PERFORMANCE  CONSTRUCTS 


Lauress  L.  Wise 

American  Institutes  for  Research 


John  P.  Campbell 

Human  Resources  Research  Organization 


Norman  6.  Peterson 

Personnel  Decisions  Research  Institute 


Presented  at  the  Annual  Conference  of  the 
Society  for  Industrial  and  Organizational  Psychology 

Atlanta,  Georgia 

April  1987 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


415 


Zdantifying  Optimal  Pradiotor  Coi^sitaa 
Aad  Tasting  for  Oanaralisability 
Jkeress  Jobs  and  Parfozmanoa  Constmets 


Industrial  psychologists  hava  long  boon  concamad  with 
tha  problam  of  matching  paopla  to  jobs.  For  a  long  time,  the 
implicit  model  in  this  enterprise  %ras  essentially  a  peg  and 
hole  model  with  job  applicants  represented  by  different  sizes 
and  shapes  of  pegs  and  job  openings  represented  by  different 
sizes  and  shapes  of  holes.  The  goal  was  to  natch  the  pegs  to 
the  holes.  The  primary  conclusion  dra%m  from  this  model,  as 
expressed  by  Ghiselli  (1966) ,  was  that  different  job  perfor¬ 
mance  prediction  measures  and  different  selection  criteria 
should  be  developed  and  validated  for  different  jobs  and  job 
environments . 

More  recently,  Schmidt  and  Hunter  (1981)  created  a 
paradigm  shift  when  they  showed  convincingly  that  a  large 
pairt  of  the  situational  variance  in  prediction  validities  is 
attributable  not  to  differences  in  job  requirements,  but  to 
methodological  artifacts.  Validity  generalization  is  now  a 
household  word  in  Industrial  and  Organizational  psychology. 

As  the  term  has  come  to  be  used,  it  refers  to  properties  of  a 
distribution  of  criterion-related  validity  coefficients 
generated  by  using  one  or  more  measures  of  the  same  construct 
to  predict  general  job  performance  within  broad  families  of 
jobs.  The  interesting  parts  of  the  distribution  are  its 
overall  mean  and  the  degree  to  which  its  variance  can  be 
accoimted  for  by  statistical  artifacts  (e.g.,  criterion 
unreliability,  sampling  error)  vs.  the  sxabstantive  characteris 
tics  of  different  situations  (e.g.,  different  abilities  are 
required  by  different  jobs) .  Arguments  continue  as  to  how  to 
define  the  appropriate  population  of  coefficients  and  how 
large  the  substantive  variance  has  to  be  before  we  should 
worry  about  it. 

Huch  of  the  discussion  of  validity  generalization  has 
focused  on  the  prediction  of  overall  job  performance  using  a 
measure  of  general  mental  ability.  Not  much  attention  has 
been  devoted  to  whether  different  predictor  constructs  or 
different  performance  components  define  different  populations 
for  validity  generalization  pxxrposes.  From  this  new  perspec¬ 
tive,  all  the  pegs  are  the  same  shape  and  the  only  question 
is  whether  they  are  big  enough  to  fill  any  particular  hole. 

In  Project  A,  we  have  been  building  a  model  for  both 
sides  of  the  equation.  That  is,  we  are  attempting  to  define 
the  total  domain  of  potentially  useful  prediction  information, 
describe  it  in  terms  of  its  basic  constructs,  and  then  develop 
representative  measures  of  those  constructs  (Peterson,  Hough, 
Dunnette,  Rosse,  Houston,  Toquam  &  Wing,  1987) .  Similarly 
for  the  criterion  side,  we  have  tried  to  define  the  total 
domain  of  job  performance,  describe  it  too  in  terms  of  basic 


416 


factors,  and  usa  nultlpla  asthods  to  provida  scoras  on  aach 
parforaanca  factor  (Caapball,  Falkar,  Boraan  &  Ruasay,  1987; 
Caapball,  McHanry  &  Wisa,  1987;  Wlsa,  Caapball,  McHanry  & 
Hansar,  1986) . 

Statad  siaply,  our  working  thaory  la  that  parforaanca  is 
not  ona  thing  and  that  tha  corralations  batvaan  tha  aajor 
coaponanta  of  parforaanca  do  not  approach  tha  liaits  of  their 
raliabilitias.  Siailarly,  at  laaat  soaa  of  tha  basic  pradictor 
constructs  (latant  variablas)  that  account  for  individual 
diffarancas  at  tha  tiaa  of  hira  ara  also  not  highly  intarcorre> 
latad.  As  part  of  Pro j act  A,  aach  of  thasa  doaains  has  baen 
aodalad  and  aaasxirad  for  a  divarsa  and  raprasantative  seunple 
of  antry-laval  jobs.  Spacifically,  concurrant  validity  data 
hava  baan  collactad  for  ovar  4,000  soldiars  in  a  cora  seusple 
of  nina  jobs.  With  thasa  data  wa  can  addrass  such  questions 
such  as:  How  do  the  validities  for  each  of  several  predictor 
constructs  generalize  across  different  conponants  of  parfor- 
nance?  How  do  tha  validities  of  tha  pradictor  battery  for  a 
particular  parfomanca  conponant  generalize  across  jobs?  It 
is  to  thasa  questions  that  wa  now  t\im. 


Method 


Tha  data  analyzed  for  this  paper  included  scoras  for 
five  job  p'arformanca  constructs  and  twenty-four  pradictor 
constructs  collactad  on  a  seunpla  of  several  hundred  soldiers 
in  each  of  nine  different  jobs.  Young,  Harris,  Hoffnan  & 
Wise  (1987)  have  described  the  collection  and  editing  of 
thasa  data.  Tedsla  1  lists  tha  pradictor  and  parfomance 
construct  scoras.  Table  2  lists  tha  nina  Amy  jobs  and  gives 
the  nunber  of  soldiers  included  in  the  present  analyses. 


Analyses 

Sample  covariance  matrices,  including  both  the  five 
criteria  and  the  twenty- four  predictors,  were  computed  for 
each  of  the  nine  jobs.  An  overall  covarieuice  matrix  was 
computed  as  the  average  of  these  nine  matrices,  weighting 
each  by  the  corresponding  sample  size.  This  fom  of  pooling 
was  necessary  because  the  criterion  measures  ware  comprised  of 
somewhat  different  items  for  tha  different  jobs,  so  that  it 
was  not  possible  to  assure  fully  comparable  scaling  across 
jobs. 


In  tha  present  study,  covariances  wars  analyzed  as  a 
means  of  controlling  for  diffarancas  between  jobs  in 
hataroganaity  with  respect  to  the  predictor  measures.  Initial 


417 


s«l«ction  into  aach  of  the  nine  jobs  included  an  absolute 
screen  on  a  composite  of  the  subtests  from  the  Armed  Services 
Vocational  Aptitude  Battery  (ASVAB) .  Different  composites 
and  different  selection  ratios  (cutting  points)  were  used  for 
different  jobs.  For  some  jobs,  the  cutoff  point  was  at  the 
population  meam,  while  for  others  a  cutoff  as  much  as  .75 
s.d.  below  the  population  mean  %ras  used.  In  addition  to  this 
absolute  screen,  self-selection  and  attrition  diiring  and 
after  training  sesnred  to  fxirther  reduce  the  heterogeneity  of 
our  samples.  By  including  the  predictor  covariances  as  an 
explicit  part  of  our  modeling,  differences  in  heterogeneity 
were  accoiuited  for. 

The  LISREL  program  (Joreskog  &  Sorbom,  1981)  was  used  to 
model  the  predictor-criterion  relationships.  This  program 
enaUdles  direct  statistical  tests  of  the  degree  to  which 
observed  variation  in  parameter  estimates  might  be  due  simply 
to  sampling  error.  The  LISREL  program  also  allows  separate 
modeling  of  the  statistical  properties,  including  specifically 
reliadailltles,  of  the  measures  amalyzed.  It  is  thus  possible 
to  eliminate  both  sampling  error  and  differences  in  criterion 
reliability  as  artifactual  sources  of  variation  in  predictor- 
criterion  relationships . 

In  applying  LISREL  to  the  present  problem,  the  covariance 
matrices  were  divided  into  three  components.  The  first,  the 
covariances  among  the  predictors,  is  modeled  by  the  Phi 
matrix  in  .LISREL.  In  all  of  our  analyses,  the  Phi  matrix  was 
left  unconstrained  because  differences  due  to  selection  were 
fully  anticipated.  The  second  component  is  the  predictor- 
criterion  covariamces.  In  LISREL,  these  are  modeled  in  the 
caimna  matrix  as  regression  or  structixral  eguations.  These 
equations  are  used  to  estimate  criterion  scores  from  predictor 
scores.  Most  of  our  amalyses  consisted  of  testing  possible 
constraints  on  this  matrix  (i.e.,  constancies  across  criterion 
constructs  or  across  job  samples) .  The  final  component  into 
which  the  covariauice  matrices  were  divided  was  the  Psi  matrix. 
PSI  contains  the  covariances  among  the  unique/error  portion 
of  the  criterion  measures.  In  LISREL,  the  observed  criterion 
covariamces  are  modeled  as  the  sum  of  the  covariamces  among 
estimated  criterion  scores  and  the  covariances  among  the 
error/\inique  components  of  the  criterion  variadales. 

The  first  step  in  our  analyses  was  to  reduce  the  number 
of  predictor  scores  included  in  the  model.  This  was  done  to 
simplify  our  representation  of  predictor-criterion  relation¬ 
ships  and  to  make  the  subsequent  structural  equations  more 
staUale.  The  approach  used  was  to  successively  eliminate 
predictors  and  then  examine  whether  all  of  the  predictor- 
criterion  covariances  could  be  adequately  reproduced  without 
including  the  eliminated  predictors  in  any  of  the  structural 
(regression)  equations.  In  such  a  case,  the  predictors  in 


418 


qu«stlon  could  b«  droppod  without  loss  of  emy  prsdictive 
Inforaation. 

Ths  sscond  stsp  was  to  tsst  for  criterion  equivalence. 

If  two  or  more  criterion  constructs  shared  a  cosnon  set  of 
relationships  to  the  predictors,  then  further  reduction  of 
the  criterion  space  would  be  possible.  For  each  pair  of 
criterion  constructs,  a  model  was  fit  to  the  combined 
covariance  matrix  in  which  the  regression  coefficients  for  each 
predictor  were  constrained  to  be  the  same  in  both  criterion 
equations.  If  this  model  was  not  rejected  by  the  data,  then 
the  criteria  could  be  combined  without  loss  of  information 
conceiving  predictor-criterion  relationships. 

The  third  and  final  step  in  our  analyses  was  to  test  for 
equivalence  in  the  prediction  equations  for  different  jobs. 

For  each  distinct  performance  construct,  a  model  with  a 
constant  prediction  equation  across  all  nine  jobs  was  tested. 


Results 

Table  3  shows  standardized  regression  coefficients  for 
predicting  each  criterion  construct  from  the  entire  set  of 
predictor  constructs.  This  was  our  starting  point  for  reducing 
the  number  of  different  predictors  considered.  We  exzunined 
the  significance  of  each  of  these  coefficients,  using  the  £ 
statistics  provided  by  LZSREL  (testing  for  difference  from 
zero) .  Wff  eliminated  those  predictors  that  did  not  have  a 
significatnt  coefficient  (£  >  2.0)  for  any  of  the  criteria. 

This  resulted  in  a  model  which  did  not  quite  fit  the  overall 
covariemce  matrix,  so  we  put  those  predictors  with  the  largest 
modification  indices  back  into  the  equations.  (Note  that 
each  predictor  was  either  in  all  of  the  equations  or  none  at 
this  stage.)  In  the  end,  five  predictors  were  eliminated. 

Each  of  the  remaining  nineteen  predictors  had  a  significant 
loading  on  at  least  one  of  the  criteria.  Teible  4  shows  the 
reduced  structural  equations  and  gives  fit  statistics  for 
this  reduced  model. 

Table  5  shows  the  results  of  the  tests  for  criterion  e- 
quivalence.  In  all  cases,  separate  prediction  equations  were 
indicated.  The  Core  Technical  Proficiency  and  General  Soldier¬ 
ing  Proficiency  constructs  were  the  most  similar.  The  three 
primary  differences  between  the  equations  for  these  two 
constructs,  as  seen  in  Table  5,  were:  (1)  the  distinctly 
greater  significance  of  the  Combat  Interest  measure  for 
predicting  General  Soldiering  Proficiency;  (2)  the  somewhat 
greater  significance  of  spatial  and  quantitative  skills  for 
General  Soldiering  Proficiency  and  (3)  the  somewhat  greater 
significance  of  verbal  skills  for  Core  Technical  Proficiency. 

It  also  was  the  case  that  high  scores  on  the  Physical  Con- 


419 


ditioning  prsdictor  w«re  related  to  lower  Core  Technical 
Proficiency  scores  but  not  to  lower  General  Soldiering 
Proficiency  scores.  The  differences  between  the  equations 
for  the  other  criterion  equations  were  all  in  expected  direc¬ 
tions  and  fully  consistent  with  the  general  findings  reported 
by  McHenry  et  al.  (1987). 

Further  analyses  were  conducted  to  identify  optimal  sets 
of  predictors  for  each  of  the  criterion  constructs.  An 
initial  model,  in  which  only  cognitive  and  perceptual  tests 
were  used  in  predicting  proficiency  and  only  interest,  tempera¬ 
ment,  and  biographical  measures  were  \ised  in  predicting  the 
motivational  constructs  did  not  fit  the  data  adequately.  A 
number  of  iterations  were  performed  with  changes  based  on  the 
data.  Tedale  6  shows  the  predictors  and  their  standardized 
coefficients  that  were  judged  to  best  fit  the  data.  Given 
some  reliance  on  empirical  results  in  identifying  this  model, 
the  significance  level  should  not  be  over interpreted. 

Table  7  shows  the  results  of  tests  for  equivalent  predic¬ 
tion  equations  across  jobs  for  each  criterion  construct.  In 
these  analyses,  a  reduced  set  of  predictors  was  used  for  each 
performance  construct.  This  was  done  partly  because  multi¬ 
sample  runs  can  otherwise  be  inordinately  expensive  and 
partly  in  an  effort  to  achieve  some  semblance  of  parsimony. 

For  the  three  "will  do"  performzmce  constructs  (Effort 
and  Leadership,  Maintaining  Personal  Discipline,  and  Physical 
Fitness  and  Military  Bearing) ,  the  hypothesis  that  one  equation 
fits  all  jobs  could  not  be  rejected  from  the  available  data 
(shown  by  Chi-square  statistics  with  p  values  greater  than 
.05.)  For  the  General  Soldiering  Proficiency  construct,  the 
p  value  fell  between  .01  and  .05,  suggesting  at  most  very 
modest  differences  in  the  prediction  equations  across  jobs. 

For  Core  Technical  Proficiency,  however,  the  common 
prediction  equation  model  was  strongly  rejected.  TeUsle  8 
shows  the  separate  prediction  equations  estimated  for  each 
job.  Tedale  9  shows  chi-square  fit  statistics  and  p  values 
for  each  pair  of  jobs  considered  by  themselves.  For  some 
pairs  of  jobs,  such  as  the  combat  jobs,  the  optimal  prediction 
equations  were  not  significantly  different.  For  other  jobs, 
however  quite  significant  differences  were  found.  The  largest 
difference  was  between  Vehicle  Mechanics  and  Administrative 
Specialists. 

The  results  presented  in  Tadales  8  and  9  suggest  that 
there  are  significant  differences  between  the  requirements  of 
mechanical/ technical  jobs,  clerical/administrative  jobs,  and 
combat  jobs.  It  is  not  clear  whether  there  are  differences 
in  job  requirements  within  each  of  these  three  major  job 
types,  but  our  data  suggest  that  this  might  be  the  case. 


420 


B18CU88I0H  AMD  SUMMARY 


Tbtt  first  general  result  from  these  analyses  is  that 
there  are  different  components  of  job  performance,  even 
within  entry-level  positions,  that  show  different  patterns  of 
relationships  with  potential  predictors  measures.  The  predic¬ 
tors  of  job  proficiency  were,  for  the  most  part,  quite  distinct 
from  predictors  of  effort  and  leadership,  avoidance  of  dis¬ 
ciplinary  problems,  and  physical  fitness/military  bearing. 

These  results  generally  supported  our  perspective  that  job 
performemce  is,  indeed,  multidimensional  €md  not  just  one 
thing.  One  consequence  of  this  result  is  that  the  assessment 
of  overall  job  effectiveness  necessarily  involves  policy 
decisions  regarding  the  relative  importemce  of  the  different 
components  of  job  performance  in  a  particular  setting. 

Project  A  staff  are  now  in  the  process  of  collecting  such 
judgments  for  each  of  the  jobs  included  in  our  sample. 

The  second  general  result  is  that  different  mixes  of 
skills,  interests,  temperament  and  background  must  be  used  to 
obtain  optimal  prediction  of  technical  proficiency  in  dif¬ 
ferent  jobs.  These  results  are  particularly  important  for  an 
organization  like  the  Army,  which  must  select  and  train 
\intrained  individuals  for  many  different  jobs.  Individual 
differences  in  job  knowledge  prior  to  training  would  not 
necessarily  be  related  to  differences  in  job  proficiency 
after  training,  yet  we  still  find  significant  differentiation 
in  predictors  of  post-training  job  performance.  These  results 
suggest  that  the  test  studied  would  be  useful  for  classifying 
new  Army  recruits  into  jobs  that  are  best  suited  to  their 
abilities,  temperament,  and  interests. 

The  results  of  the  present  analyses  undoubtedly  understate 
differences  between  jobs  in  a  number  of  ways.  First,  the 
common  Army  experience  and  environment  shared  by  the  soldiers 
who  served  as  sxdsjects  in  Project  A  probedsly  increases 
similarities  in  job  requirements.  To  succeed  in  the  Army, 
all  soldiers  must  pass  basic  training  and  advanced  technical 
training.  All  soldiers  must  learn  and  adhere  to  Army  customs 
emd  tradition.  And,  to  at  least  some  degree,  all  soldiers 
must  subscribe  to  Army  values.  These  similarities  attenuated 
the  differences  in  job  requirements  that  we  uncovered  iir  our 
analyses.  Second,  all  soldiers  share  a  nxmber  of  respon¬ 
sibilities,  regardless  of  their  assigned  MOS.  These  shared 
responsibilities  are  reflected  in  the  job  performance  con¬ 
structs;  General  Soldiering  Proficiency,  Effort  and  Leadership, 
Personal  Discipline,  and  Physical  Fitness  and  Military  Bearing. 
These  performance  constructs  were  intended  to  capture  job 
performance  components  that  are  common  to  all  soldiers. 

Thus,  we  were  not  surprised  to  discover  that  the  optimal 


421 


predictors  of  these  performance  constructs  were  the  same 
across  all  jobs  ••  even  though  it  reduced  our  power  to  dif¬ 
ferentiate  between  jobs  on  the  basis  of  skill  and  trait 
requirements.  Third,  we  have  studied  only  entry-level  posi¬ 
tions  within  each  of  these  jobs.  Increasing  differentiation 
between  jobs  seems  likely  as  incumbents  graduate  to  more 
skilled  positions.  This  possibility  is  being  addressed  in 
continuing  Project  A  activities  aimed  at  assessing  the  perfor¬ 
mance  of  more  experienced  job  incumbents. 

One  final  caveat  is  a  reminder  that,  with  the  exception 
of  the  ASVAB  scores,  the  predictor  and  criterion  data  analyzed 
here  were  collected  concurrently,  differences  in  predictive 
relationships  may  have  been  either  worn  down  by  common  Army 
experience  or  accentuated  through  differences  in  training  and 
on-the-job  experiences.  We  are  now  engaged  in  the  longitudinal 
phase  of  Project  A  which  will  allow  us  to  assess  the  range  of 
opportunities  for  matching  Individuals  to  jobs  even  more 
conclusively  than  was  possible  in  the  present  analyses. 


Campbell,  C.G.,  Borman,  W.C.,  Pelker,  D.C.,  Ford  P. ,  Park, 

M.D.,  Pulakos,  E.C.,  Riegelhaupt,  B.J.,  &  Rumsey,  N.G. 
(1987,  April).  Development  of  project  A  iob  performance 
measures.  Paper  presented  at  the  Second  Annual  Conference 
of  the  Society  for  Industrial  and  Organizational  Psychol¬ 
ogy,  Atlanta,  GA. 

Campbell,  J.P.,  McHenry,  J.J.,.  &  Wise,  L.L.  (1987,  April). 

Analysis  of  criterion  measures;  The  modeling  of  perfor¬ 
mance.  Paper  presented  at  the  Second  Annual  Conference 
of  the  Society  for  Industrial  and  Organizational  Psychol¬ 
ogy,  Atlanta,  GA. 

Ghiselli,  E.E.  (1966).  The  validity  of  occupational  aptitude 
teat.  New  York:  Wiley. 

Joreskog,  K.6.,  &  Sorbom,  D.  (1981).  LISREL  VT  user’s  cmide. 
Mooresville,  IN:  Scientific  Sotfware. 

McHenry,  J.J.,  Hough,  L.M. ,  Toquam,  J.L. ,  Hanson,  J.A.,  & 

Ashworth,  S.  (1987,  April).  Project  A  validity  reaultat 
IllS....g8latiQnship  between  predictor  and  criterion  domains. 
Paper  presented  at  the  Second  Annual  Conference  of  the 
Society  for  Indxistrial  and  Organizational  Psychology, 
Atlanta,  GA. 


422 


Petarson,  M.6.,  Hough,  L.M. ,  Ounnette,  M.D.,  Rosse,  R.L. , 

Houston,  J.S.,  Toquaa,  J.L. ,  it  Wing,  H.  (1987,  April). 
Idsnti float ion  of  predictor  constructs  and  development 
of  new  selection/classification  tests.  Paper  presented 
at  the  Second  Annual  Conference  of  the  Society  for 
Industrial  and  Organizational  Psychology,  Atlamta,  6A. 

Schmidt,  P.L. ,  &  Hunter,  J.E.  (1981).  Employment  testing: 

Old  theories  and  new  research  findings.  American 
Psychologies.  36.  1128-1137. 

Wise,  L.L. ,  C2unpbell,  J.P.,  McHenry,  J.J.,  &  Hanser,  L.R. 

(1986,  August).  A  latent  structure  model  of  job  perfor¬ 
mance  factors.  Paper  presented  at  the  92nd  Annual 
convention  of  the  American  Psychological  Association, 
Washington ,  D . C . 

Yoiing,  Y.Y.,  Harris,  J.H.,  Hoffman,  6.R. ,  &  Wise,  L.L.  (1987, 
April) .  Large  scale  data  collection  and  data  base 
preparation.  Paper  presented  at  the  Second  Annual 
conference  of  the  Society  for  Industrial  and  Organization¬ 
al  Psychology,  Atlanta,  GA. 


423 


Tablg  1 

Pr>digtor  and  Job  PTformanea  Constructs 


_ Pradletor  Constructs 

Type  of  Measure  Congtr^gt  WaBft - 

Ara«d  S«xvictts  Quantitative 

Voc.  latitude  (ASVAB)  Speed 

Technical 

Verbal 


Project  A:  Cognitive  Spatial 


Project  A:  Percept, 
and  Psychomotor 


Complex  Perceptual  Accuracy 
Complex  Perceptual  Speed 
Nximerical  Speed  amd  Accuracy 
Psychomotor  Ability 
Simple  Reaction  Accuracy 
Simple  Reaction  Speed 


Project  A:  Temperament 


Adjustment 
Dependabil ity 
Physical  Conditioning 
Achievement  Orientation 


Project  A:  Interest 


Project  A:  Job 
Orientation 


Audio/Visual  Interest 
Combat  Interest 
Food  Service  Interest 
Protective  Service  Interest 
Skilled  Technical  Interest 
Structural/Machines  Interest 

Job  Autonomy 

Organizational/Coworker  Support 
Routine  Work 


XyPR  Qf  Mgggurg 
Hands-on  and  Written 
Tests 


Performance  Constructs _ 

Construct  Name _ 

Core  Technical  Proficiency 
General  Soldiering  Proficiency 


Administrative  Effort  and  Leadership 

Measiires  and  Ratings  Personal  Discipline 

Physical  Fitness  and  Military  Bearing 


424 


Studiad 


Enlisted  Job 


Munber  of  Xncuabents 


Infantryman  491 
Cannon  Crevnaember  464 
Armor  Crewmember  394 
Single  Channel  Radio  Operator  289 
Light  Ifheel  Vehicle  Mechanic  478 
Motor  Transport  Operator  507 
Administrative  Specialist  427 
Medical  Specialist  392 
Military  Police  597 


Total  4039 


425 


standardized  Regression  Coeffleiwits  for  Each  Criterion  Awintt  All  Prtdietors 
CBaaed  cn  Pooled  Covariance  Matrix.  n«4(B9) 


Phyiical 


Core 

General 

Effort 

Fitness/ 

Technical 

Soldiering 

and 

Personal 

Military 

Predictor 

Proficiency 

Proficiency 

LtritffThfP 

Discipline 

Bearing 

Quantitative 

.097 

.130 

.012 

.063 

.023 

Speed 

.020 

-.011 

.062 

.032 

.088 

Technical 

.103 

.141 

.155 

.082 

-.038 

Verfael 

.116 

.098 

-.080 

-.029 

-.106 

Spactial 

.196 

.279 

.014 

-.005 

-.021 

Complex  Perc.  Accy. 

.085 

.112 

.046 

.026 

.015 

Complex  Perc.  Speed 

.032 

.039 

.052 

.033 

.032 

Nua.  Speed/Accy. 

.032 

.020 

.016 

-.028 

-.026 

Psychomotor 

.003 

.047 

.020 

-.029 

-.011 

Simple  Reaction  Accy. 

.012 

-.004 

-.020 

.011 

-.036 

Simple  Reaction  Speed 

.026 

.028 

-.014 

-.019 

.044 

Adjustment 

-.004 

.000 

-.004 

.001 

.025 

Dependability 

.127 

.128 

.119 

.314 

.099 

Physical  Condition 

-.053 

-.007 

.008 

-.054 

.248 

Adiiev.  Orient. 

-.003 

-.045 

.221 

.040 

.131 

Audio/Visual  Interest 

-.054 

-.008 

-.026 

-.025 

.038 

Combat  Interest 

.103 

.167 

.117 

.017 

.042 

Food  Service  Interest 

-.050 

-.036 

-.060 

-.042 

-.021 

Prot.  Service  Interest 

-.009 

.003 

.011 

-.033 

-.051 

Skilled  Tech.  Interest 

-.010 

-.020 

-.031 

-.009 

.009 

Structural/Nachine  Int 

.  .054 

-.007 

-.011 

-.026 

,  -.052 

Job  Autonoaiy 

.014 

-.007 

.000 

-.042 

-.046 

Job  Si^port 

.034 

.023 

-.020 

-.020 

.017 

Routine  Work 

-.023 

-.039 

-.037 

-.015 

-.010 

R-Squared 

.223 

.305 

.146 

.114 

.160 

NSISX  Values  are  maximun  likelihood  estimates  which  may  differ  slightly  from  OLS  estimates 

presented  elsewhere.  Also,  no  attempt  was  made  to  estimate  parameters  for  the  unselected 
population. 


426 


Tlbi«  4 

Standirdlitd  Rtar—tlon  Cotff<ei«ntt  for  Each  Criterlff^  flWiTtt  ^r«dictor  Set 

faa««d  an  Pooi«d  Coviriie*  Matrix.  w4tB9) 


Predictor _ 

Core 

Technical 

Proflgfspgy 

General 

Soldiering 

Proficiencv 

Effort 

and 

leadership 

Personal 

Discipline 

Physical 
Fitness/ 
Ni  1 { tary 

gftr.tra 

OuMttitative 

.068 

.119 

.002 

.058 

.018 

Speed 

.025 

-.006 

.065 

.033 

.091 

Technical 

.102 

.142 

.159 

.064 

-.039 

Verbal 

.123 

.106 

'.075 

'.027 

-.101 

Spatial 

.205 

.293 

.034 

.007 

-.012 

Conplex  Parc  Acey 

.070 

.093 

.022 

.011 

.000 

Coaplex  Perc  Speed 

e 

e 

e 

e 

a 

Nua.  Spee^Accy 

.039 

.027 

.024 

-.023 

-.019 

Psychonetor 

.007 

.051 

.025 

'.025 

-.007. 

Siaple  Reaction  Accy 

.012 

'.004 

'.020 

.011 

-.037 

Sinaile  Reaction  Speed 

.033 

.036 

'.004 

'.012 

.052 

AdjuatBMnt 

e 

. 

e 

e 

a 

Oependabi 1 i ty 

.126 

.123 

.108 

.308 

.100 

Phyeical  Condition 

-.053 

'.006 

.011 

'.053 

.249 

Achiev.  Orient. 

.005 

'.034 

.220 

.039 

.153 

Audio/Visual  Interest 

'.054 

'.014 

'.041 

'.030 

.044 

CcaiMt  Interest 

.102 

.167 

.118 

.017 

.042 

Food  Service  Interest 

'.058 

'.047. 

'.069 

-.045 

-.023 

Prot.  Service  Interest 

'.006 

.005 

.006 

'.034 

-.048 

Skilled  Tech  Interest 

e 

• 

e 

a 

a 

Structural/Machine  Int. 

.051 

'.014 

'.019 

'.029 

-.052 

Job  Autonasy 

.021 

'.002 

'.004 

-.047 

-.045 

Job  Si^port 

• 

• 

• 

e 

a 

Routine  Work 

• 

« 

• 

a 

a 

R* Squared 

.221 

.303 

.143 

.113 

.159 

NOTE;  Chi'Square  ■  35.07,  df«25,  p«.09.  A  denotes  that  the  predictor  was 

not  included  in  the  regression  model. 

427 


Coro 

Sanaral 

Effort 

Taehnical 

Soldiaring 

and 

Paraonal 

Profieianev 

£CSllfiiJD&l 

Laadarahip 

Diaciplina 

Ganaral  Soldiaring  Proficianey 

84.6 

Effort  and  Laadarahip 

374.5 

416.6 

Paraonal  Oiaciplina 

513.1 

754.2 

566.6 

Phyaical  Fitnaaa/Nil.  laariiqi 

973.2 

1230.9 

559.6 

542.0 

tiyS?  Chi-squBr«*  had  P  valuta  lata  than  .001,  indicating  rajaction 
of  tha  coaMon  pradietor  aquation  aodal. 


Tibia  6 


Pradietor 

Cora 

Taehnical 

Proficianey 

Ganaral 

Soldiaring 

Profieianev 

Effort 

and 

laadarahip 

Paraonal 

Diaciplina 

Phyaical 
Fitnaas/ 
Military 
Bear i no 

Quantativa 

.084 

.114 

a 

.055 

a 

Spoad 

.027 

a 

.045 

a 

.077 

Taehnical 

.101 

.137 

.140 

.059 

-.047 

Vorbaal 

.132 

.113 

-.061 

-.092 

Spatial 

.212 

.295 

.034 

a 

Conplax  Pore.  Acey. 

.064 

.086 

a 

a 

Nub.  Spaad/Aecy. 

.049 

.035 

.038 

a 

Paychonotor 

a 

.054 

.041 

a 

Sinpla  Roaction  Aeey. 

m 

a 

a 

-.031 

Sinpla  Raaetion  Spaad 

a 

a 

a 

.054 

Dapandability 

.127 

.124 

.112 

.314 

.100 

Phyaical  Condition 

-.049 

a 

a 

-.062 

.245 

Achiav.  Orient. 

a 

-.040 

.222 

.041 

.356 

Audio/Viaual  Intaraat 

-.046 

a 

-.043 

-.034 

.039 

Conbat  Intaraat 

.097 

.158 

.108 

a 

.036 

Pood  Sarviea  Intaraat 

-.063 

-.051 

-.060 

-.038 

• 

Prot.  Sarviea  Intaraat 

a 

a 

a 

-.040 

-.053 

Struetural/Hachina  Int 

.  .059 

a 

a 

a 

-.047 

Job  Autonomy 

a 

a 

a 

-.046 

-.041 

R-Squarad 

.220 

.302 

.140 

.110 

.155 

*iS2IEl  Chi'Squara  ■  32.86,  df>38,  pB.71.  A  ”.''  danotaa  that  tha  pradietor  waa 
not  includad  in  tha  ragraaaion  nodal  for  that  critarion. 


428 


Tafelg-J 

Taat  for  Common  Pradictlon  Equations  Across  Nin«  Jobs 


CRITERION 


Cors  Tschnlcsl  Proflclsncy 
6«n«ral  Soldisring  Proflclsncy 
Effort  and  Lsadsrahlp 
Psraonal  Dlsclpllns 
Physical  Fltnsas/Mll^  Bearing 


CHI-SQUARE 

DF 

P 

220.5 

65 

.000 

80.7 

57 

.02 

69.7 

57 

.12 

91.3 

73 

.07 

111.3 

89 

.06 

429 


UfeiS-fi 

Stmdirdind  Rearitlon  Cotfficicnti  for  Predicting  Cora  Techniot  Profleitncv  OvnU  trd  for  Each 


Predictor 

11B: 

Infantry 

13i: 

Cannon 

Crew 

19E: 

Araor 

Crew 

31C: 

Radio 

Oper. 

638: 

Vehicle 

Mechanic 

64C: 

Truck 

Driver 

TIL: 

Adiin. 

Spec. 

91A: 

Medic 

Spec. 

958: 

Military 

Police 

All 

.... 

ASVAB 

Qumtitative. 

.101 

.030 

.038 

.247 

-.018 

.058 

.301 

.053 

.106 

.096 

Todinical 

.105 

.002 

.164 

.133 

.436 

.284 

-.179 

.088 

.036 

.101 

Vtrfaal 

.162 

.067 

.100 

.127 

*.030 

-.021 

.088 

.245 

.110 

.100 

Projaet  A 

Spatial 

.250 

.218 

.209 

.101 

.080 

.163 

.250 

.225 

.177 

.197 

Caaplax  Prae  Aecy 

.045 

.052 

.101 

.095 

-.028 

.093 

.149 

.019 

.078 

.063 

Oepandability 

.066 

.056 

.080 

.118 

.071 

.079 

.144 

.226 

.116 

.099 

CodMt  Interest 

.164 

.210 

.156 

.034 

.157 

.062 

-.062 

.106 

-.010 

.097 

Food  Serv.  lot. 

-.086 

-.101 

-.052 

.060 

-.057 

-.041 

-.048 

-.018 

-.067 

-.057 

R- Squared 

Separate  Pred. 

.350 

.163 

.291 

.224 

.349 

.225 

.300 

.291 

.136 

n/a 

Coabined  Pred. 

.336 

.126 

.283 

.184 

.270 

.197 

.186 

.253 

.117 

.214 

430 


CQ 


A 

O 

4J 

2 


a 

a 

o 

u 

o 

<  •;} 
o 

>«> 

o  a 

C  iH 

o 

•H  >« 
O  *i 
•H  -H 
(M  iH 

O-H 

JH 

CU  10 

ja 

P\^  o 
<0  u 

•9  0  0* 
iH  -H  V. 

A  e  a 

<0  £ 

Eh  O 

« 

Eh 


0) 

S 


9 

Vi 

O 

CJ 


(0 

> 


o 

%* 


CO 

I 


C  JS 

o  o 


9 

ti 


9 

u 

9 

C 

9 

(9 


>1 

•M 


10 

> 


431 


f 

LARGE-SCALE  DATA  COLLECTION  AND 
DATA  BASE  PREPARATION 


Winnie  Y.  Young 

Anerican  Institutes  for  Research 
Janis  S.  Houston 

Personnel  Decisions  Research  Institute 

James  H.  Harris 
R.  Gene  Hoffman 

Human  Resources  Research  Organization 
Lauress  L.  Wise 

American  Institutes  for  Research 


Presented  at  the  Annual  Conference  of  the 
Society  for  Industrial  and  Organizational  Psychology 

Atlanta,  Georgia 

April  1987 


The  views  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


433 


Large  Scale  Data  Collection  and  Data  Base  Preparation 


In  the  sumner  and  fall  of  1985,  data  were  collected  for 
the  concurrent  validation  phase  of  the  Amy's  Project  A.  An 
extensive  array  of  predictor  and  criterion  aeasures  were 
administered  to  approximately  9,500  entry-level  soldiers  and 
ratings  of  these  soldiers'  perfomance  were  also  obtained 
from  approximately  7,000  supervisors.  The  original  Project  A 
Research  Plan  specified  a  concurrent  validation  target  sample 
size  of  600  job  incumbents  for  each  of  the  19  jobs  or 
Military  Occupation  Specialties  (MOS) ,  using  procedures  that 
had  been  tried  out  and  refined  during  the  predictor  and 
criterion  field  tests.  The  Research  Plan  further  specified 
that  data  would  be  collected  at  13  separate  sites  (Amy 
Posts)  in  the  United  States  and  two  in  Europe.  Individual 
sites  were  selected  on  a  basis  that  maximized  the  probability 
of  obtaining  the  target  sample  sizes  without  exceeding  the 
project  budget. 

The  logistics  involved  in  such  a  large-scale  data 
collection  effort  are  fairly  complicated  and  the  sheer  volume 
and  complexity  of  the  resultant  data  base  presents  a 
challenge  to  ensure  that  the  data  available  for  analysis  will 
be  the  higliest  quality  possible.  This  paper  describes  our 
attempt  to  meet  the  logistical  demands  of  the  concurrent 
validation  data  collection  and  the  procedures  we  used  to 
assemble  and  edit  the  resultant  data  base. 


Data  Collection 


Sampling  Plan 

The  general  sampling  plan  was  to  use  the  Amy's 
World-Wide  Locator  System  to  identify  all  the  first-tem 
enlisted  personnel  in  the  19  target  MOS  at  each  of  the  15 
selected  sites  who  entered  the  Amy  between  1  July  1983  and 
30  July  1984.  The  intent  was  to  represent  as  many  Amy 
"units"  as  possible  while  preserving  enough  cases  within 
units  to  provide  a  "within  rater"  variance  estimate  for  the 
supervisor  and  peer  ratings  of  job  perfomance.  A  two-step 
sampling  procedure  was  followed. 

1.  For  each  site,  identify  a  subset  of  the  19  target 

MOS  from  which  it  would  be  possible  to  draw  a  large 
enough  sample.  That  is,  given  vhe  entry  date 
"window"  and  given  that  only  50-70  percent  of  the 


434 


people  on  any  list  of  potential  siibjects  could 
actually  be  found  and  tested,  what  MOS  are  large 
enough  to  warrant  sampling  them  at  that  site? 

2.  For  each  MOS  in  the  subset  identified  above, 

identify  the  smallest  "unit"  from  which  6-10  people 
can  be  drawn.  Ideally,  we  wanted  to  sample  4  to  6 
units  from  each  site  and  6  to  12  people  from  each 
unit.  For  the  total  concurrent  sample  this  would 
provide  enough  units  to  average  out  or  account  for 
differential  training  effects  and  leadership 
climates,  while  still  providing  sufficient  degrees 
of  freedom  for  investigating  within-group  effects 
such  as  rater  differences  in  performance  appraisals 
and  in  work  environment  descriptions. 

This  procedure  yielded  a  rather  elaborate  matrix  of  all 
MOS  by  site  by  unit  combinations  that  could  reasonably  be 
saunpled.  From  this,  a  specific  sampling  plan  was  prepared 
for  each  site  that  represented  the  most  efficient  way 
possible  by  obtaining  our  target  across  sites  of  600  soldiers 
in  each  of  the  19  MOS. 

For  a  few  MOS,  there  were  fewer  than  600  soldiers 
available  across  all  15  sites  with  the  appropriate  accession 
dates.  The  decision  was  made  to  slightly  over-sample  the 
remaining  MOS,  so  our  total  target  sample  was  still 
approximately  11,400. 


Preparation  for  Data  Collection 

Obtaining  support.  Work  began  over  a  year  in  advance  of 
the  actual  data  collection  to  obtain  the  support  necessary  to 
reach  our  target  sample.  Troop  Support  Requests  (TSR)  had  to 
be  submitted  far  in  advance,  detailing  the  purpose  of  the 
data  collection,  the  schedule  of  events,  the  locations,  the 
niUQber  of  hours  required  of  each  soldier,  and  the  complete 
personnel,  classroom,  and  equipment  requirements.  After  the 
TSR  were  submitted,  senior  project  staff  met  with  the  Chief 
Executive  Officers  (four  star  generals)  of  the  organizations 
providing  support.  Numerous  briefings  were  conducted  at 
various  points  down  the  chain  of  commands,  culminating  in  a 
two-day  meeting  with  the  Point  of  Contact  (POC)  assigned  to 
this  effort  at  each  site,  six  months  prior  to  data  collection 
at  that  site.  From  this  point  on,  we  coordinated  primarily 
with  the  POC,  who  was  responsible  for  providing  the  required 
troops  to  be  tested,  test  scorers  and  other  support 
personnel,  equipment,  classrooms,  etc.  This  sequence  of 
activities  could  be  summarized  as  twelve  months  of  planning, 
briefing,  coordinating,  cajoling,  visiting  and  monitoring. 


435 


Training  data  collection  teams.  In  order  to  cover  15 
sites  in  a  relatively  short  period  of  time,  where  each  site 
required  four  to  eight  weeks  of  testing,  several  data 
collection  teams  had  to  be  assembled  and  trained. 


Each  data  collection  team  was  composed  of  a  Test  Site 
Manager  and  six  or  seven  team  members  who  were  responsible 
for  predictor  and  criterion  administration.  Test  Site 
Managers  were  selected  from  regular  project  staff  who  had 
participated  heavily  in  the  field  tests.  The  remaining  team 
members  were  made  up  of  a  combination  of  regular  project 
staff  and  individuals  (e.g.,  graduate  students)  specifically 
recruited  for  the  data  collection  effort.  This  team  was 
assisted  on-site  by  eight  Non-commissioned  Officer  (NCO) 
scorers  (for  the  Hands-On  tests) ,  one  company-grade  officer 
POC,  and  up  to  five  NCO  support  personnel. 

The  data  collection  teams  were  given  three  days  of 
training  at  a  central  location.  During  this  period.  Project 
A  was  explained  in  detail,  including  its  operational  and 
scientific  objectives.  After  the  logistics  of  how  the  team 
would  operate  (transportation,  meals,  etc.)  were  discussed, 
the  procedures  for  data  entry  from  the  field  to  the  computer 
data  base  were  explained  in  some  detail.  Emphasis  was  placed 
on  how  to  reduce  entry  errors  by  ensuring  careful  recording 
of  responses  and  correct  identification  of  answer  sheets  and 
computer  diskettes. 

Next,  each  predictor  and  criterion  measure  was  examined 
and  explained.  The  trainees  took  each  predictor  test  in  its 
entirety,  and  worked  through  samples  of  each  criterion 
measure.  Considerable  time  was  spent  on  the  nature  of  the 
performance  rating  scales,  rating  errors,  rater  training,  and 
the  procedures  to  be  used  for  administering  the  ratings.  All 
predictor  and  criterion  administration  manuals,  which  had 
been  prepared  in  advance,  were  studied  and  reviewed,  role 
playing  exercises  were  conducted,  and  hands-on  instruction 
for  maintenance  of  the  computerized  test  equipment  was  given. 

The  intent  was  that  by  the  end  of  the  three-day  session 
each  team  member  would  (a)  be  thoroughly  familiar  with  all 
predictor  tests  and  performance  measures,  (b)  understand  the 
goals  of  the  data  collection  and  the  procedures  for  obtaining 
these  goals,  (c)  have  practiced  administering  the  instruments 
and  received  feedback,  and  (d)  be  committed  to  making  the 
data  collection  as  error-free  as  possible. 

As  noted  above,  eight  NCO  scorers  were  required  for 
Hands-On  test  scoring.  Training  for  these  scorers  took  place 
on  site  over  one  full  day  and  consisted  of  (a)  a  thorough 


436 


briefing  on  Project  A,  (b)  an  opportunity  to  take  the  tests 
themselves,  (c)  a  check-out  of  the  specified  equipment,  and 
(d)  multiple  practice  trials  in  scoring  each  task,  with 
feedback  from  the  project  staff. 


Data  Collection  Procedures 

Each  soldier  was  tested  for  a  total  of  either  16  or  8 
hours,  depending  on  whether  he/ she  was  in  a  "Batch  A"  MOS 
(for  which  we  had  MOS-specific  criterion  measures)  or  a 
"Batch  Z"  MOS  (for  which  we  did  not) .  Some  of  the  testing 
could  only  be  done  in  fairly  small  groups  because  of  the 
equipment  rec[uired,  e.g.,  Hands-On  criterion  tests  and 
computerized  predictor  tests.  To  accommodate  this 
restriction  and  still  process  the  maximum  possible  number  of 
soldiers  each  day,  the  predicator  and  criterion  measures  were 
arranged  in  four-hour  testing  blocks,  each  conducted  in  a 
separate  location.  A  group  of  soldiers  could  then  be 
separated  randomly  into  subgroups  and  the  subgroups  rated 
through  the  separate  testing  blocks.  The  measures 
administered  in  each  testing  block  are  shown  in  Figure  1. 

Data  Base  Preparation 


Description  of  Data  Base 

A  total  of  9,430  entry-level  soldiers  in  19  MOS  were 
tested  during  the  concurrent  validation  data  collection. 

This  represents  approximately  83%  if  the  total  target  sample 
of  11,400.  Figure  2  presents  a  breakdown  of  this  sample  by 
site  and  by  MOS.  The  MOS  are  grouped  by  "batch".  Recall 
that  for  nine  MOS,  designated  Batch  A,  an  extensive  array  of 
criterion  measures  was  developed  and  administered,  including 
a  nimber  of  MOS-specific  measures.  For  the  remaining  10  MOS, 
designated  Batch  Z,  an  abbreviated  set  of  criterion  measures 
was  used. 


All  of  19  MOS  received  the  same  set  of  predictors.  A 
complete  listing  of  these  predictor  and  criterion  measures 
appears  below: 


437 


A.  Predictors: 


e  Paper-and  pencil  tests:  six  cognitive  ability 
tests 

e  Computer  battery:  10  perceptual/psychomotor 
tests 

B.  Criteria: 

e  Hands-on  tests:  observation  and  scoring  of 
performance  on  14-17  carefully  seunpled  job 
tasks  (Batch  A  only) 

e  Job  knowledge  tests:  written  tests  of  facts 
and  procedures  for  30  carefully  siunpled  job 
tasks  (Batch  A  only) 

e  School  knowledge  tests:  written  tests  of  facts 
and  procedures  taught  during  training  for  the 
MOS 

e  Ratings  of  performance  by  peers  and 

supervisors  on  several  sets  of  rating  scales, 
including: 


-  11  Army-Wide  Behavior  Summary  Scales 

**  8  to  13  MOS-Specific  Behavior  Summary  Scales 
(Batch  A  only) 

-  15  Job  Task  Rating  Scales  (Batch  A  only) 

-  11  Common  Task  Rating  Scales  (Batch  Z  only) 

-  40  Combat  Performance  Prediction  Scales 

e  Self-report  of  administrative  and  personnel 
records,  including: 

-  letters  and  commendations 

>  Physical  Readiness  Test  Score 

-  Marksmanship  Score 

-  disciplinary  actions  (Articles  15,  Flag 
Actions) 

e  Job  History:  *how  often*  and  'last  time*  30 
sampled  job  tasks  were  performed  (Batch  A 
only) 

e  Work  Environment:  ratings  of  99  items 
concerning  situation  at  work 


438 


Table  1  illustrates  the  sheer  volume  of  data  that  were 
collected. 


Editing  and  Preparation  of  Data,  Files 

One  of  the  first  steps  in  dealing  with  so  many  different 
instruments,  collected  at  different  times  at  different 
testing  stations,  was  to  match  up  all  of  these  pieces  of 
information  using  the  common  identifier,  in  our  case  the 
Social  Security  Nximber  (SSN) .  Before  any  attempt  was  made  to 
edit  each  individual  file  for  merging,  a  Link  file  was 
created.  Basically,  the  Link  file  consisted  of  the  SSN  for 
each  soldier  for  whom  we  had  any  data  and  a  flag  for  each 
data  source.  The  idea  was  to  build  a  relatively  manageable 
file  and  to  resolve  all  the  problems  concerning  the 
identifiers  before  merging. 

We  found  that,  although  soldiers  could  reliedsly  write 
their  SSN  on  a  piece  of  paper,  they  did  not  always  'grid* 
them  correctly  on  our  machine  scannad>le  forms.  In  general, 
we  found  about  5  percent  SSN  errors  in  our  sample.  There  is 
no  simple  way  to  identify  the  erroneous  digit (s),  so  a  great 
deal  of  time  was  spent  matching  unmatched  records  by  hand. 

In  addition  to  editing  the  SSN  in  our  Link  file,  we  also 
spent  some  time  editing  a  selective  set  of  demographic 
variables.  Including  sex  and  race. 

It  was  not  sufficient  to  verify  that  variables  such  as 
sex  and  race  were  within  range.  In  order  to  identify 
"errors"  on  theses  variables,  we  merged  two  other  Army  data 
sources  with  our  data  base  and  compared  all  three  sources  for 
discrepancies.  In  the  case  of  sex  codes,  we  frequently 
inspected  the  soldiers'  first  names  to  resolve  differences. 

In  the  case  of  race  variables,  we  used  a  two-out-of-three 
majority  rule. 

After  the  initial  editing  of  basic  identifiers  and 
demographic  variables  was  complete  and  before  different 
pieces  of  data  were  merged  and  ready  for  analysis,  there  were 
many  issues  that  needed  to  be  resolved.  These  issues 
included:  random  responding,  missing  data,  different  testing 
conditions  and  equipment  differences.  The  decisions  that 

were  made  regarding  how  to  deal  with  each  issue  have 
subsequently  proved  to  be  extremely  important  to  the  analyses 
that  were  performed. 


439 


Random  Responding.  For  multiple  choice  tests,  it  is  not 
uncommon  for  some  responders  to  randomly  mark  on  the  answer 
sheets.  This  is  particularly  true  when  there  is  no  real 
incentive  for  taking  the  test  to  begin  with,  unlike  tests 
such  as  the  SAT  or  GRE.  Three  different  procedures  were 
developed  to  identify  random  responding.  The  first  method 
was  by  reviewing  the  test  administrators'  "problem"  logs.  At 
each  site,  the  test  administrator  was  instructed  to  write 
down  any  unusual  situations  during  testing,  e.g.,  it  was 
obvious  that  someone  was  not  taking  the  test  seriously.  Each 
entry  on  these  logs  was  entered  on  a  computer  file  with  SSN 
and  a  code  for  that  particular  "problem".  These  problems 
included  things  such  as  responded  randomly,  refused  to 
cooperate  and  fell  asleep  etc.  Scores  were  not  computed  for 
tests  we  were  told  had  been  completed  at  random. 

The  second  method  used  to  detect  random  responding  was 
to  score  the  eight  items  that  were  developed  for  this  purpose 
and  embedded  in  one  of  the  predictors.  These  items  had  an 
extremely  obvious  "right"  answer.  For  example: 

The  braach  of  the  servioe  that  deals  most  with  airplanes 
is  the: 

1.  Military  Police 

2 .  Coast  Guard 

3 .  Air  Force 

If  a  soldier  got  three  or  more  of  these  eight  items 
wrong,  he/she  was  assiimed  to  be  responding  randomly  and 
scores  were  not  computed  for  that  instrument. 

The  third  method  used  was  a  random  response  index.  For 
the  written  tests,  a  random  response  index  was  defined  as  the 
correlation  between  the  item  score  (1  for  correct  and  0  for 
incorrect)  and  item  difficulty  (expressed  as  the  proportion 
of  subjects  who  answered  the  item  correctly) .  For  most 
soldiers  this  correlation  was  positive  since  there  was  a 
tendency  to  get  the  easier  items  correct  and  miss  the  more 
difficult  items.  In  a  few  cases  this  correlation  was 
essentially  zero,  suggesting  random  responding.  For  these 
subjects,  all  of  their  responses  for  that  particular 
instrument  were  set  to  missing. 

For  the  performance  rating  scale  data,  we  screened  for 
unreliable  raters.  We  constructed  reliability  indices  for 
each  rater  by  comparing  their  ratings  with  the  average  of  all 
other  raters'  ratings  of  the  same  soldiers  on  the  same 

scales.  Both  mean  difference  and  correlational  indices  were 
used  in  identifying  "outliers"  among  the  raters. 

Missing  Data.  No  matter  how  carefully  any  data 
®oli*c:tion  is  planned  and  monitored,  some  amount  of  missing 
data  is  inevitable  for  various  reasons.  Some  of  these 
reasons  for  missing  data  in  our  data  set  are  sho%m  in 
Figure  3. 


440 


One  option  for  dealing  with  missing  data  is  to  delete 
all  of  the  records  for  any  soldier  who  had  any  missing  data. 
This  was  not  an  acceptable  procedure  for  our  data  set. 

Table  2  shows  the  number  of  Batch  A  soldiers  with  different 
patterns  of  complete  and  missing  data  across  the  four  main 
performance  measurement  methods:  School  Knowledge  Test  (SK) , 
Job  Knowledge  Test  (JK) ,  Hands-on  Tests  (HO)  and  Ratings 
(RA) .  Fewer  than  15%  of  the  cases  in  the  entire  sample  have 
complete  data  for  all  four  me^ods.  If  the  ratings  data  are 
set  aside,  there  are  still  fewer  than  25%  of  the  subjects 
with  complete  hands-on,  job  knowledge,  and  school  knowledge 
data.  Ignoring  the  hands-on  data  still  leaves  only  about  42% 
of  the  subjects  with  complete  data  on  the  remaining  measures. 
Even  if  one  was  willing  to  conclude  that  the  seunple  of 
soldiers  with  complete  data  is  representative  of  the  target 
population,  the  sheer  loss  of  statistical  power  associated 
with  the  reduced  sample  size  would  be  unacceptable. 

The  processing  of  missing  data  was  approached  in  two 
stages.  In  the  first  stage,  we  focused  on  one  instrument  at 
a  time  and  dealt  with  only  those  subjects  who  were  missing  a 
small  amount  of  data  on  the  instrument  under  consideration. 

In  the  second  stage,  we  formulated  procedures  for  dealing 
with  subjects  who  were  missing  a  high  percentage  or  all  of 
the  data  on  a  given  instrument. 

Stage  I:  Missing  Data  within  Each  Instrument.  We 
examined  the  distribution  of  the  missing  data  for  each 
instrument -and  found  that  most  were  blmodal.  Most  soldiers 
had  only  a  small  number  of  missing  items  or  scales  but  a 
small  number  had  all  or  nearly  all  elements  missing.  For 
cases  with  minimal  missing  data  (usually  a  ten  percent  limit 
was  used) ,  we  filled  in  missing  values  so  as  to  be  able  to 
compute  overall  performance  scores.  (The  procedure  used  for 
imputing  values  is  discussed  at  the  end  of  this  section.)  For 
cases  with  larger  amounts  of  missing  data,  we  did  not  attempt 
to  compute  any  scores  for  the  instrximent  in  cpiestion.  In 
general,  we  sought  to  retain  90  -  95%  of  the  soldiers  tested 
in  each  MOS,  but  to  eliminate  cases  with  more  than  10% 
missing  elements. 

Hands-on  measures  have  a  different  pattern  of  missing 
data  that  warrant  a  more  detailed  discussion  here.  First  of 
all,  for  several  MOS,  the  hands-on  scoring  differed  for 
<3iffmrent  equipment.  In  order  to  achieve  comparable  scores 
across  these  equipment  differences,  wesplit  the  examines  into 
separate  "tracks"  corresponding  to  the  different  variations 
in  equipment.  For  Military  Police  (95B) ,  for  example, 
females  use  and  were  tested  on  a  .38  caliber  hand  gun  while 
males  use  and  were  tested  on  a  .45  caliber  hand  gun.  We 
found  minimal  differences  between  track  samples  on  those 
tasks  that  were  scored  the  seuae,  so  we  achieved  comparable 
scoring  by  standardizing  scores  computed  from  tracked  tasks 
separately  for  each  track  sample.  Scores  for  each  track  were 
standardized  to  have  a  mean  and  standard  deviation  that 
matched  the  original  overall  mean  for  the  score  in  question. 


441 


W*  also  checked  for  anosalies  in  the  Hands-On  data 
such  as  outliers  in  quantitative  scores,  inconpatible 
pass-fail  patterns  of  scores,  incorrect  coding  of  soldier  ID 
nuabers,  and  incorrect  coding  of  tracked  tests  for  soldiers. 
This  examination  produced  1200  queries.  In  about  3/4  of  the 
cases,  we  were  able  to  reach  soae  resolution  which  provided 
scores  for  aissing  data  points  or  corrected  scores  for 
incompatible  scores  by  retrieving  the  original  scoresheets. 
For  example,  one  common  problem  was  for  scorers  to  %n:ite  a 
note  that  a  soldier  could  not  do  any  of  the  steps  in  a  task, 
and  then  not  mark  the  "NO-GO"  (fail)  columns  for  those  tasks 
on  the  scoresheet.  These  "missing"  scores  were  changed  to 
"NO-GO"  scores. 

By  far,  the  greatest  reason  for  non-equivalence  in  data 
sets  across  soldiers  was  related  to  equipment.  In  six  of  the 
nine  MOS,  anticipated  variations  in  equipment  necessitated 
the  preparation  of  tracked  versions  of  tests.  Two  of  these 
tracks  resulted  in  whole  tasks  not  being  administered  to  a 
large  number  of  soldiers;  an  additional  14  tasks  were  tracked 
with  either  completely  separate  versions  of  the  test,  or 
branching  within  steps  of  the  test.  In  addition, 
unanticipated  variations  in  equipment  required  Hands-On  test 
managers  (project  staff)  to  modify  tests  in  three  MOS  on  site 
by  specifying  steps  that  should  not  be  scored  on  existing 
tests  Seven  tasks  had  such  "last-minute"  tracks.  On  an  MOS 
by  MOS  basis,  rules  were  established,  consistent  with  Army 
Doctrine,  for  ecjuating  performance  scores  across  these 
equipment  variations.  Discounting  these  planned  and 

unplanned  tracks,  most  of  the  remaining  cases  of  incomplete 
data  resulted  from  unavailable  or  faulty  equipment.  Thirteen 

percent  of  the  task  tests  could  not  be  administered  because 
of  unavailable  equipment  and  two  percent  could  not  be  scored 
due  to  faulty  equipment. 

The  remaining  cases  of  missing  Hands-On  data  were  due  to 
a  variety  of  circtimstances ,  most  of  them  unavoideUole,  such 
as  soldiers  who  had  physical  handicaps  that  prohibited  them 
from  performing  certain  activities,  injuries,  illness  and 
competing  demands  for  the  soldiers  time. 

After  dropping  cases  with  too  much  missing  data  or  with 
random  responses,  we  imputed  values  for  the  remaining  missing 
data  so  that  summary  scores  could  be  computed.  The  option 
that  we  used  to  fill  in  missing  values  was  a  procedure  that 
had  been  developed  for  the  National  Center  for  Education 
Statistics  (now  the  Center  for  Education  Statistics) 
known  as  PROC  IMPUTE.^  Several  features  of  PROC  IMPUTE  made 
it  preferable  to  other  readily  available  options  for  filling 
in  the  missing  values. 


^Wise,  L.L.  &  McLaughlin,  D.H.  (1980).  Guidebook  for 
the  imputation  of  missing  data.  Palo  Alto,  CA:  American 
Institutes  for  Research. 


442 


PROC  IMPUTE  U8«s  rcgrttssion  equations  to  predict  missing 
values  and  also  adds  a  random  variable  with  variance  equal  to 
the  error  of  estimate  for  predicting  the  missing  value  such 
that  the  imputed  values  are  not  highly  correlated  with  values 
on  other  nonimputed  values. 

PROC  IMPUTE  was  used  in  all  instances  except  one.  For 
the  %rritten  tests,  a  distinction  was  made  between  internal 
omits  (prior  to  the  last  item  answered)  and  items  that  were 
not  reached  (omits  after  the  last  item  answered) .  For 
internal  omits,  we  assumed  that  the  exeuninee  did  not  know  the 
answer  and  substituted  a  score  equal  to  the  guessing  rate 
(e.g.,  .2  for  a  5  option  item).  If  the  actual  proportion 
passing  the  item  was  lower  than  the  guessing  rate,  the 
proportion  passing  was  used  instead.  We  made  no  assumptions 
regarding  items  not  reached  since  the  examinee  may  not  have 
had  time  to  demonstrate  knowledge  of  the  item.  Not  reached 
items  were  imputed  with  PROC  IMPUTE,  as  were  all  missing 
hands-on  steps  and  rating  scales. 

Stage  lit  Missing  Instruments.  After  cases  were  dropped 
or  missing  values  were  filled  in  on  an  instrument-by¬ 
instrument  basis,  we  were  ready  to  compute  overall 
performance  scores  that  combined  information  from  the 

different  measurement  methods.  The  decision  at  this  stage 
was  whether  to  estimate  individual  scores  if  only  partial 
data  were  availedsle  for  the  individual.  We  decided  on  a  50% 
rule.  An  examinee  had  to  have  data  on  at  least  half  of  the 
instrximents  going  into  a  particular  performance  construct 
before  we  would  estimate  a  score  on  that  construct .  Where 
50%  or  fewer  of  the  pieces  were  missing,  PROC  IMPUTE  was 
again  used  to  fill  in  the  missing  pieces. 

Table  3  shows  the  number  of  soldiers  in  each  MOS  who  had 
missing  values  for  each  instrument  after  the  completion  of 
the  Stage  I  imputations  and  screening.  In  most  instances, 
the  number  of  missing  cases  was  quite  small  (1  or  2%) .  The 
chief  exceptions  were  two  of  the  administrative  measures. 
(Administrative  measures  were  not  included  in  Stage  I  imputa¬ 
tions  because  they  do  not  include  a  large  number  of  component 
parts.)  Physical  Readiness  Test  scores  were  missing  for  10 
to  15%  of  the  examines.  In  most  instances,  peer  and  super¬ 
visor  ratings  of  physical  fitness  were  available  for  these 
same  examines.  Similarly,  Promotion  Rate  Deviation  scores 
were  missing  for  a  significant  number  of  cases  (15%).  This 
was  primarily  due  to  problems  in  retrieving  Accession  File 
Information  needed  to  compute  time- in-service. 


443 


SinuMry 


Collecting  data  fron  10,000  soldiers  in  15  locations 
over  six  months  is  a  difficult  task,  one  that  requires 
careful  planning,  attention  to  detail,  an  ability  to  adapt,  a 
fondness  for  crisis  management,  and  a  special  relationship 
with  the  telephone.  For  anyone  planning  an  effort  of  like 
grandetur  (or  even  grander) ,  a  few  lessons  learned  from  some 
of  the  survivors  seems  appropriate. 

Planning .  Start  as  early  as  possible  (18  months  before 
collecting  data)  to  identify  the  support  you  will  need,  to 
include  personnel,  equipment,  facilities,  and  time  require¬ 
ments.  Once  you  know  what  you  need  and  when  you  need  it, 
schedule  a  series  of  briefings.  Start  at  the  top  with  the 
Chief  Executive  Officers  of  the  organizations  who  provide  the 
support  and  work  your  way  through  a  series  of  briefings  until 
you  reach  the  local  POC  responsible  for  seeing  that  you  get 
what  you  need  when  you  need  it.  Be  prepared  to  change  your 
plans  at  each  step  to  meet  local  concerns.  Once  you  meet  and 
brief  your  POC,  you  can  begin  coordinating. 

Coordinating .  The  closer  the  time  to  begin  data 
collecting,  the  more  frequently  you  will  speak  to  the  POC. 
Expect  to  speak  daily  when  you  get  within  30  days  of  data 
collection.  In  some  instances,  you  may  have  to  make  a  trip 
to  the  installation  for  a  final  coordination  meeting.  Be 
prepared  to  be  very  flexible  with  regard  to  the 
installations'  internal  schedule. 

Operating .  Most  of  the  lessons  learned  in  this  category 
have  to  do  with  hands-on  testing. 

1.  Many  instances  of  equipment  variation  can  be  (and 
were)  anticipated.  Test  developers  and  site  coordinators 
must  find  out  what  major  pieces  of  equipment  are  not  likely 
to  be  available  at  the  selected  sites  in  advance  of  actual 
testing  if  high  quality  tracked  tests  are  to  be  prepared. 

2.  Printed  scoresheets  must  be  proofed  carefully  to 
ensure  that  for  every  step  which  should  be  scored,  a  score 
can  be  recorded. 

3.  Scorers  must  be  thoroughly  trained,  not  only  on  how 
to  set  up  and  administer  the  tests,  but  also  on  how  to  record 
data  on  the  scoresheets.  They  must  be  given  practice  in 
using 

the  scoresheets  (not  just  talked  through  it)  before  testing, 
and  monitored  closely  during  testing,  especially  with  the 
first  few  soldiers  tested.  Continual  monitoring  must  also 
occur  throughout  the  testing. 

4.  Scorers  and  hands-on  managers  must  document 
meticulously  who  was  tested  on  what,  and  also  who  wasn't 
tested  on  what,  and  why. 


444 


5.  Experienced  hands-on  managers  are  often  able  to 
implement  procedures  to  deal  with  equipment  malfvinctions  or 
variations,  but  these  too  must  be  documented. 

6.  Completed  scoresheets  must  be  checked  as  soon  as 
possible  after  testing  so  that  careless  or  incorrect  scoring 
can  be  detected,  and  the  errant  scorer  can  be  retrained. 

Collecting.  Many  of  the  problems  that  ve  encountered  in 
Linking  could  have  been  avoided  if  all  of  the  data  were 
checked  carefully  at  the  site  before  sending  them  off  to 
scanning  company.  We  also  found  that  where  we  could  use  a 
single  header  sheet  for  a  group  of  instruments  that  can  be 
scanned  together,  then  there  are  fewer  opportunities  for 
discrepancies . 

Processing.  Never  try  to  do  too  many  all  at  once!  Deal 

with  one  Instrument  at  a  time  before  merging.  Frequently, 
problems  will  get  complicated  after  merging. 

Imputing.  The  decision  rules  and  imputation  procedures 
used  with  the  CV  data  were  successful  in  allowing  us  to 
develop  performance  scores  for  a  very  high  proportion  of  the 
soldiers  tested.  Based  on  the  avallzUQle  evidence,  we  have  no 
reason  to  believe  that  any  significant  distortions  were 
introduced  while  achieving  this  goal.  Relatively  few  values 
were  imputed  at  all.  Where  imputation  was  necessary,  it  was 
done  with  great  care. 

The  apparent  ease  of  imputation  procedures  should  not, 
however,  lead  us  to  relax  our  data  collection  procedures  in 
the  future.  Lessons  learned  from  investigation  of  the 
reasons  for  missing  data  will  be  used  to  modify  data 
collection  procedures  for  the  Project  A  longitudinal 
validation  so  as  to  further  reduce  the  amount  of  missing 
data. 


445 


Figur*  1 

COMCUSIUEMT  VlkLIDATZOM  TS8TZMO  BLOCKS  (FOUR  HOURS  BRCH) 
BATCH  A  MOS  I  BATCH  Z  MOS 


Block 

1 

Predictor  Tests 

Block  1 

Predictor  Tests 

Block 

2 

School  Knowledge  Tests 
MOS-Specific  Job 
Knowledge  Tests 

Block  2 

School  Knowledge 
Tests 

Army-Wide  Ratings 

Block 

3 

MOS  Specific  Hands- 
on  Tests 

Block 

4 

MOS-Specific  Ratings 
Army-Wide  Ratings 

Figure  2 

NCURRENT  VALIDATION  SAMPLE 
OLDIERS  BY  MOS  BY  LOCATION 


o  (0 
o 


N 


m 

SlSgS^SsSiSSsi 

K 

E 

8888^888888888 

S  8 

•  4 

8«888«8«888888 

!  3 

8*888888888*^81 

S  3 

•8*"8*8*888*88a 

•  s 

M  M 

~3 

8  i 

B 

888888*S*88*88 

5  ! 

•  • 

5 

w 

ppphhuppup 

o  •: 

1%  s 

_ 5 

■■■■hbbbAAHbbAJ 

8 

1 

1 

8  8 
•  • 

B 

88888888888888 

9  IN» 

O  V 

K 

B 

BBnB 

S  3 

< 

• 

8  S 

888S8888S88888 

:  8 

• 

i 

88888888888888 

1  5 

B 

■VBVVVVVVVBiVVn 

S  5 

E 

X  3 

B 

i  s 

B 

88888888  •'88888 

!  3 

■ 

8*888888888*88 

X  3 

I 

iiiijiiii!ii]| 

-  1 

A 

1  1 

447 


HToM  I  7.44  7.«7  t-M  a.M  747  •.41  141 


Figure  3 

SBABONS  FOR  MISSING  DATA 


KNOWLEDGE  TEST 

•  Soldiers  Not  Available  for  Part  or  All  of  Scheduled 
Time 

e  Soldiers  Exceptionally  Slow  in  Taking  Test 
e  Soldiers  Not  Following  Instructions 

RATING  DATA 

e  No  Suitable  Raters  Available 
e  Soldier  Does  Not  Perform  Some  Kinds  of  Tasks 
e  Rater  Not  following  Instructions 

HANDS-ON  DATA 

e  Anticipated  Variation  in  Equipment 
e  Unanticipated  Variation  in  Equipment 
e  Soldiers  Not  Available  for  Part  or  All  of  Scheduled 
Time 

e  Equipment  Breakdo%m  or  Nonavailability 
e  Conditions  Preventing  Testing  of  Some  Soldiers  on  Some 
Tasks 

e  Scorer  or  Scoresheet  Errors 


448 


TABLE  1 


Concurrent 

Validation 

Data  Base 

TOTAL 

records 

TOTAL 

BYTE  OF  TNFOi 

TOTAL 

DATA  POINTS^ 

Predictor 

Data:  Paper-&>pencil 

88,669 

7,241K 

6,321K 

Predictor 

Data  Computer 

687,830 

51,336K 

11,832K 

Criterion 

Data :  Hands-on 

77,921 

3,326K 

2,015K 

Criterion  Data:  All  other, 
including  written  tests 

107,561 

12,994K 

9,283K 

TOTAL 

961,981 

74,900X 

29,451K 

^Included  all  identifying  Infonnation  from  each  instrument 


449 


Tabl*  2 


MDMBER  OF  CASES  WITH  COMPLETE  DATA  TOR 
EACH  COMBZMATZOM  OF  CRZTERZOM  ZMSTRUMEMTS 

Batch  A  MOS 


Complete 
SK  &  JK 

Comp  SK 
Miss  JK 

Miss  SK 
Comp  JK 

Missing 
SK  &  JK 

TOTAL 

Complete  HO 

& 

N 

772 

189 

122 

58 

1141 

Complete  RA 

% 

14.65 

3.59 

2.32 

1.10 

21.66 

Complete  HO 

& 

N 

526 

130 

72 

29 

757 

Missing  RA 

% 

9.98 

2.47 

1.37 

0.55 

14.37 

Missing  HO 

& 

N 

1436 

364 

215 

125 

2140 

Complete  RA 

% 

27.26 

6.91 

4.08 

2.37 

40.62 

Missing  HO 

& 

N 

784 

241 

125 

80 

1230 

Missing  RA 

% 

14.88 

4.57 

2.37 

1.52 

23.35 

TOTAL 

- 

N 

3518 

924 

534 

292 

5268 

% 

66.78 

17.54 

10.14 

5.54 

100.00 

450 


Tabl*  3 


NUMBER  OF  CASES 
MISSING  EACH  INSTRUMENT 


llfi 

Total  N  702 

Final  Counts  After  Stage  3 

13B  19E 

667  503 

:  Screening 

21S.  £1£ 

366  637  686 

and  Imputation 

HL 

514 

91A 

501 

95B 

692 

Missing  Hands-On 

20 

55 

29 

25 

68 

46 

20 

5 

27 

Missing  Job  Know 

24 

29 

44 

40 

41 

18 

13 

18 

29 

Missing  Sch  Know 

18 

28 

18 

17 

25 

17 

21 

22 

18 

Missing  AW  BARS 

7 

2 

1 

8 

12 

8 

11 

3 

0 

Missing  MOS  BARS 

9 

12 

3 

9 

18 

13 

23 

8 

0 

Missing  Comb  Pred 

7 

2 

1 

8 

12 

8 

11 

3 

0 

Missing  Al:  Awards 

14 

24 

13 

13 

11 

12 

14 

11 

4 

Missing  Al:  Phys  Red 

63 

93 

53 

30 

80 

81 

60 

59 

57 

Missing  A4:  Arts.  15 

23 

28 

16 

14 

11 

14 

15 

14 

4 

Missing  A5:  Prom  Rt 

109 

143 

83 

62 

97 

86 

79 

61 

84 

Total  Complete 

512 

406 

335 

241 

411 

486 

355 

374 

513 

%  Complete 

72.9 

60.9 

66.6 

65.9 

64.5 

70.9 

69.1 

74.7 

74.1 

Final  Counts  After  Stage  II  Imputation 


Total  Complete 

693 

656 

490 

356 

615 

675 

506 

492 

686 

%  Complete 

98.7 

98.4 

97.4 

97.3 

96.6 

98.4 

98.4 

98.2 

99.1 

451 


CHARACTERISTICS  OF  BIODATA  ITEMS  AND 
THEIR  RELATIONSHIP  TO  VALIDITY 


Bruce  N.  Barge 

Personnel  Decisions  Research  Institute 


Presented  on  Symposium, 

"The  Use  of  Biodata  in  the  1980s  and  Beyond" 


At  the  Annual  Convention  of  the 
American  Psychological  Association 
NeM  York 


August  1987 


/ 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 

Research  Institute  or  the  Department  of  the  Army. 

» 

f 


453 


Abstract 


To  investigate  factors  related  to  the  predictive  ability  of 
biodata,  a  set  of  biodata  items  was  evaluated  in  terms  of  three 
hypothesised  characteristics:  heterogeneity,  behavioral 

discreteness,  and  behavioral  consistency.  Evaluations  for  each 
item  were  correlated  with  the  validity  of  the  item,  which  had  been 
previously  obtained  in  a  large  criterion-related  study  predicting 
both  job  performance  and  training  performance.  Results  suggest 
that  the  item  characteristics  can  be  rated  reliably  and  that  the 
ratings  are  significantly  related  to  validities.  Implications  of 
the  study  are  discussed  for  both  conceptual  understanding  of 
biodata  and  increased  predictability  for  biodata  measures. 


454 


CharccUrlftict  of  Bloijota  Iteas 
mad  Their  Rclatioashtp  to  Validity 

Biographical  data  (biodata)  has  been  widely  used  for  many 
years  and  has  yielded  impressive  validities  in  predicting 
performance  in  applied  settings.  Ghiselli  (19SS)  found  biodata  to 
be  among  the  best  predictors  available  in  his  extensive  review  of 
cognitive  and  noo>eognitive  predictors,  and  more  recent  reviews  by 
Owens  (1976),  Reilly  and  Chao  (1982),  and  Barge  and  Hough  (1984) 
have  also  supported  the  outstanding  predictive  power  of  biodata. 
Despite  its  notable  validity,  however,  the  mechanisms  through 
which  biodata  attains  prediction  are  still  poorly  understood.  The 
constructs  that  organize  and  define  the  domain  have  not  been 
widely  accepted,  and  characteristics  of  a  biodata  instrument  that 
produce  its  validity  have  not  been  explicitly  identified. 

Conceptuallyoriented  research  on  biodata  has  increased  in 
the  last  ten  years,  led  by  Owens  and  bis  colleagues  who  developed 
and  validated  a  conceptual  model  and  associated  biodata  subgroups 
(Owens,  1976;  Owens  &  Schoenfeldt,  1979;  Davis,  1984).  Other 
research  has  addressed  stability  and  generalizability  of  biodata 
factors  (Eberhardt  &  Muchinsky,  1982a,  Lautenschlager  A  Shaffer, 
1987),  relationships  between  biodata  and  vocational  interests 
(Eberhardt  A  Muchinsky,  1982b,  1984),  and  linking  of  biodata  to 
job  analysis  components  (Pannone,  1984).  Rational  keying  of 
biodata  scales  has  also  been  researched  (Matteson,  1978;  Mitchell 
A  Klimoski,  1982). 


455 


While  this  incretsed  conceptual  orientation  has  been 
valuable,  other  aspects  of  biodata  have  received  relatively  little 


attention. 

In 

particular,  little 

is 

known  about 

the 

characteristics 

of 

biodata  measures 

that 

contribute  to 

their 

predictive  validity.  Many  developers  of  biodata  instruments  have 
their  own  theories  of  what  produces  a  valid  inventory,  but  these 
theories  are  typically  informal  and  have  usually  not  been  tested 
empirically.  The  purpose  of  this  research  is  to  delineate  three 
hypotheses  pertaining  to  biodata  item  validity,  and  to  test  these 
hypotheses  formally. 

The  first  hypothesis  relates  to  the  heterogeneity  of  biodata 
items,  the  tendency  for  biodata  items  to  incorporate  several 
”pure*  behavioral  tendencies  or  traits  in  a  single  complex 
behavior.  For  example,  a  biodata  item  pertaining  to  performance 
in  school  may  simultaneously  tap  intelligence,  academic  interest, 
dependability,  and  several  other  characteristics.  This 

heterogeneity  may  contribute  to  the  item’s  validity  because  most 
performance  criteria  are  also  heterogeneous  and  require  the 
application  of  several  characteristics  simultaneously.  Thus, 
biodata  items  that  are  maximally  heterogeneous  may  produce  optimal 
validity. 

A  second  hypothesis  pertains  to  the  behavioral  discreteness 
of  biodata  items.  Behavioral  discreteness  refers  to  the  tendency 
for  biodata  items  to  address  a  single,  perhaps  verifiable. 


456 


behavior  rather  than  a  more  abstract  or  summary  characteristic. 
For  example,  a  biodata  item  may  ask  about  the  number  of  jobs  a 

respondent  has  held  rather  than  asking  for  a  self>perception  of 

stability.  Since  the  response  to  a  more  discrete  item  involves 
less  respondent  evaluation  and  is  also  potentially  verifiable,  the 
information  provided  may  be  more  accurate  and  may  produce  better 
prediction. 

The  final  hypothesis  to  be  tested  involves  behavioral 

consistency,  or  the  extent  to  which  the  behavior  addressed  in  the 
biodata  item  parallels  behavior  involved  in  the  criterion.  This 
hypothesis  is  related  to  the  "sign  vs.  sample”  distinction 

(Wernimont  A  Campbell,  1968),  which  suggests  that  predictors  can 
function  as  a  "sample*  that  very  closely  parallels  criterion 
behaviors  or  as  a  *sign*  that  may  be  very  different  in  content 
from  criterion  behavior.  An  assumed  advantage  of  biodata  items  is 
that  the  behaviors  they  address  are  more  similar  to  the  criterion 
than  may  be  the  case  for  "sign"  measures  such  as  personality  or 
vocational  interests.  This  similarity  results  in  less  of  an 
inferential  leap  from  predictor  to  criterion  and  may  therefore 
improve  validity.  Thus,  biodata  items  developed  to  closely 
parallel  criterion  behaviors  may  produce  the  highest  validity. 

Each  of  the  hypotheses  described  above  has  been  mentioned 
previously  as  a  potential  reason  for  the  high  validity  of  biodata 
measures.  Each  hypothesized  characteristic  also  distinguishes 


457 


biodatt  from  other  domains  of  atsessmeat,  since  heteroteneity. 
behavioral  discreteness,  and  behavioral  consistency  are  more  often 
characteristics  of  biodata  items  than  of  items  from  related 
domains,  e.g.  personality  or  vocational  interests.  The 
characteristics  may  describe  quite  well  the  biodata  domain  as  a 
whole,  yet  individual  biodata  items  may  also  differ  a  great  deal 
in  their  standing  on  each  of  the  characteristics.  The  objective 
of  this  research  was  to  determine  whether  item  differences  on  the 
characteristics  can  be  evaluated  reliably  and  whether  evaluations 
of  these  characteristics  are  related  to  the  criterion*related 
validity  of  the  items. 

Method 

Biodats  it£m 

103  items  from  Owens*  Biographical  Questionnaire  (BQ; 
Psychological  Corporation,  Copyright,  1987)  were  evaluated  in  this 
investigation.  The  BQ  is  among  the  most  heavily  researched 
biodata  inventories  in  existence  and  is  composed  of  the  items  that 
were  found  to  measure  the  biodata  domain  best  in  a  series  of 
iterative  analyses  (cf.  Owens  A  Schoenfeldt,  1979).  Items  pertain 
primarily  to  experiences  occurring  early  in  life  and  during  and 
shortly  after  high  school,  which  was  appropriate  for  the  sample 
from  which  the  validity  data  were  available.  Items  were  also 
quite  diverse  in  content  and  addressed  virtually  all  significant 
areas  of  life  experience.  A  multiple  choice,  continuous  response 


458 


format  was  used  with  the  items,  (e.t.  ”How  active  have  you  beea  in 
athletics?  Extremely  Active,  Very  Active,  etc."). 

Raters 

Two  groups  of  raters  were  included  in  the  investigation, 
referred  to  as  experts  and  students.  The  expert  group  includes  40 
of  the  country's  most  highly  recognized  and  experienced  biodata 
researchers,  each  of  whom  has  published  in  the  area  and  many  of 
whom  have  developed  and  validated  biodata  inventories  personally. 
Although  all  members  of  this  group  could  be  considered  expert, 
they  differ  widely  in  perspective.  Some  are  employed  in  industry, 
some  at  universities,  some  at  research  firms,  and  a  few  are 
retired.  In  addition,  their  ideas  about  biodata  are  often 

strikingly  different  (based  on  their  published  work),  ranging  from 
a  highly  conceptual  orientation  to  a  strict  empirical  prediction 
stance. 

The  student  group  includes  17  graduate  students  in 
industrial/organizational  psychology  at  the  University  of 
Minnesota.  At  the  time  of  the  investigation,  students  had 
completed  between  6  months  and  3  years,  6  months  of  graduate 
education,  but  none  had  extensive  experience  or  knowledge 
regarding  biodata  in  general  or  the  development  of  a  biodata 
inventory.  The  students  differed  somewhat  in  amount  of 
experience,  both  in  validation  research  and  in  research  in 
general. 


459 


fi£  Charaeteristici 


Rating  packages  were  developed  for  each  rater,  including  an 
explanation  of  each  hypothesized  item  characteristic,  a  set  of 
instructions  and  example  item  ratings,  and  a  rating  booklet.  The 
rating  booklet  collected  three  ratings  (one  for  each  hypothesized 
characteristic)  for  each  of  the  103  items.  Raters  were  asked  to 
read  through  the  rating  materials,  make  their  ratings,  and  return 
their  completed  booklets.  They  were  also  encouraged  to  bring  up 
any  questions  or  comments  concerning  the  rating  task. 

Validity  Hgxa. 

Item  validity  data  had  been  obtained  for  each  of  the  103 
biodata  items  as  part  of  Project  A,  which  is  a  very  large 
criterion-related  validation  effort  intended  to  improve  the 
selection  and  classification  of  Army  enlisted  personnel  (Campbell, 
1987;  Eaton,  1984).  The  validity  sample  included  junior  Army 
enlisted  personnel  working  in  each  of  four  jobs  (radio  teletype, 
armor  crewman,  vehicle  operator,  and  administrative  specialist), 
and  validity  information  was  available  for  both  training  and  job 
performance  criteria.  Sample  sizes  range  from  700  to  2200  for 
each  job  in  the  training  performance  data  set  and  from  140  to  268 
per  job  in  the  job  performance  data  set. 

Within  each  of  these  data  sets,  item  validities  are  reported 
separately  for  a  number  of  criterion  constructs  or  measures 
developed  as  part  of  Project  A.  For  example,  the  end-of-training 


460 


eriterit  for  admiBittrative  specialists  includes:  1)  Typing 

learning  rate.  2)  Final  typing  speed.  3)  Typing  tasks,  times 
tested,  and  4)  Nontyping  tasks,  times  tested.  Job  performance 
criteria  include  five  performance  constructs  for  all  jobs:  Core 
Technical  Proficiency.  General  Task  Proficiency. 
Effort/Leadership/Self'development.  Personal  Discipline.  and 
Physical  Fitness  and  Personal  Appearance. 

Procedure 

Ratings  of  the  biodata  items  were  analyzed  separately  for 
expert  and  student  groups.  Means,  standard  deviations,  and 
frequencies  were  computed  for  each  item  for  each  characteristic. 
Interpreter  reliabilities  were  also  calculated  for  each  of  the 
rated  characteristics,  separately  for  the  individual  rater  and  for 
the  group  composite.  Finally,  correlations  were  computed  between 
item  means  for  each  of  the  characteristics,  to  examine'  both  the 
relationships  between  the  characteristics  and  the  relationship 
between  expert  and  student  ratings. 

In  the  validity  data,  the  range,  mean,  and  standard  deviation 
of  the  item  validities  within  each  data  set  (i.e.  for  each 
criterion  measure  for  each  job)  were  calculated.  These  analyses 
provide  information  concerning  the  general  level  of  validity 
attained  by  the  items,  as  well  as  the  variance  of  the  validities. 

Correlations  were  computed  between  the  mean  ratings  for  each 
item  for  each  characteristic  and  the  validities  of  the  items  for 


461 


each  eriterion/criterion  construct.  These  correlttions  indicate 
the  degree  to  which  an  item*s  standing  on  a  characteristic  is 
related  to  its  validity.  All  correlational  analyses  were 

conducted  separately  for  expert  and  student  raters;  results  for 
each  group  were  then  compared. 

Results 

ligjiL  Ratings 

Completed  rating  booklets  were  received  from  22  members  or 
SS%  of  the  expert  group  and  12  members  or  71%  of  the  student 
group.  Comments  from  the  raters  indicated  the  ratings  were 

sometimes  quite  difficult  to  make,  although  difficulty  apparently 
varied  across  each  item*characteristic  judgment.  Item  means  were 
slightly  lower  for  ratings  of  behavioral  consistency  than  for 

heterogeneity  and  behavioral  discreteness,  and  rating  standard 
deviations  were  around  1.0  for  each  of  the  characteristics. 
Overall,  the  ratings  ranged  from  1.33  to  4.83  (on  a  5  point 
scale),  and  most  means  were  between  2  and  4.  The  frequency 
information  suggested  that  most  ratings  were  fairly  normally 
distributed  about  the  mean. 

Ratings  of  each  of  the  characteristics  were  correlated 
somewhat  with  each  other.  Heterogeneity  correlated  >.27  with 
Behavioral  Discreteness  and  -.33  with  Behavioral  Consistency. 
Discreteness  correlated  xero  with  Consistency.  The  ratings  of 
experts  correlated  highly  with  those  of  students:  .78  for 


462 


heterogeneity,  .89  for  behaviornl  consistency,  and  .97  for 

behavioral  discreteness. 

Inter-rater  reliabilities  for  the  ratings  are  shown  in  Table 
1.  In  general,  the  reliabilities  are  quite  good  and  suggest  the 
ratings  are  sufficiently  consistent  to  justify  relating  them  to 
the  item  validities.  Ratings  appear  to  be  most  reliable  for  the 
behavioral  discreteness  and  behavioral  consistency  characteristics 
and  are  less  reliable  for  heterogeneity.  Expert  and  student 
raters  attained  approximately  equal  levels  of  reliability. 

Item  Validities 

The  overall  absolute  level  of  the  item  validities  ranged  from 
a  low  of  zero  to  a  high  of  .43.  Validities  were  higher  and  had 
more  variance  in  the  job  performance  data  sets  than  in  the 
training  performance  data.  Mean  item  validities  were 
approximately  .08  with  a  standard  deviation  of  about  .06  in  the 
job  performance  data  sets  and  were  around  .04  with  a  standard 
deviation  of  about  .03  in  the  training  performance  data  sets. 
Correlations  between  rgt  1,1188  8Jlil  78.11  d.iti88 

Correlational  results  for  the  job  performance  criteria  are 
summarized  in  Tables  2  through  5.  These  tables  report 
correlations  between  item  mean  ratings  for  each  characteristic  and 
item  validities  for  each  of  five  performance  criteria.  Results 
are  therefore  an  index  of  the  relationship  between  ratings  of  an 
item’s  characteristics  and  the  item’s  validity  in  predicting 
various  dimensions  of  job  performance. 


463 


Tables  2  (expert  results)  and  3  (student  results)  show  the 
correlations  obtained  when  computed  in  the  sample  as  a  whole  and 
when  computed  separately  within  each  job  and  then  averaged. 
Tables  4  and  5  present  similar  results,  reported  separately  for 

the  administrative  specialist  job  only  and  the  armor  crewman  job 

only.  The  correlations  obtained  for  the  administrative  specialist 

job  are  the  highest  of  all  four  jobs,  while  results  for  the  armor 

crewman  are  the  lowest  and  least  consistent.  Despite  this  across- 
job  variability  in  the  level  of  correlation,  the  pattern  of 

correlation  for  each  of  the  jobs  is  quite  consistent  with  the 

overall  across'job  average. 

Several  findings  are  notable  in  the  correlational  results. 
First,  ratings  of  heterogeneity  are  negatively  correlated  with  the 
validity  of  the  items;  that  is,  items  that  were  rated  low  on 
heterogeneity  are  more  valid  than  items  judged  to  be  high  in 
heterogeneity.  Ratings  of  behavioral  discreteness  and  behavioral 
consistency  are  positively  correlated  with  item  validities.  Thus, 
items  rated  as  discrete  and  consistent  with  criterion  behavior 
tend  to  produce  higher  validities  than  items  of  a  more  evaluative 
or  summary  nature  or  items  that  function  as  a  *sign*  of  criterion 
behavior  rather  than  a  "sample*. 

The  magnitude  of  the  relationships  between  ratings  and 
validities  is  strongest  for  the  criteria  of  Core  'Technical 
Proficiency  and  General  Task  Proficiency.  These  performance 
dimensions  are  known  informally  as  the  *can>do*  criteria,  as 
opposed  to  *wiII*do*  criteria  such  as  Effort/Leadership.  Personal 


464 


Discipline,  or  Physical  Fitness.  Relationships  are  also  much 
stronger  when  computed  within  the  administrative  specialist  job 
alone.  Finally,  comparison  of  results  shows  that  both  the  pattern 
and  level  of  correlation  is  highly  consistent  for  both  expert  and 
student  raters. 

Findings  from  correlational  analyses  with  the  training 

performance  validities  present  a  similar  picture,  as  shown  in 

Tables  6  and  7.  Results  are  averaged  across  the  training  criteria 
within  each  job,  since  the  criteria  differed  by  job  and  it  was 
therefore  impossible  to  compare  results  across  jobs.  Averaging 
across  training  criteria  within  a  job  also  appears  reasonable 
since  the  criteria  are  similar  conceptually  (e.g.  typing  learning 
rate  and  final  typing  speed). 

As  with  the  job  performance  criteria,  correlations  are 
negative  between  training  validities  and  ratings  of  heterogeneity 
and  are  positive  between  validities  and  ratings  of  behavioral 
discreteness  and  behavioral  consistency.  The  strongest 

relationships  are  for  the  vehicle  operator  and  administrative 
specialist  jobs,  a  result  that  is  also  found  in  the  job 
performance  results.  Finally,  the  pattern  and  level  of 

correlation  obtained  is  again  highly  similar  for  expert  and 
student  raters,  although  the  expert  ratings  attained  slightly 
higher  relationships  with  the  training  validities. 


465 


Discauioa 


Although  prelimioary,  investigation  results  suggest  that  each 
of  the  three  hypothesized  characteristics  (heterogeneity, 
behavioral  discreteness,  and  behavioral  consistency)  is  an 
important,  stable,  descriptor  of  biodata  items  and  their  ability 
to  predict  criteria.  Each  characteristic  was  rated  reliably  by 
both  expert  and  student  raters,  and  each  characteristic  correlated 
significantly  with  item  validities  for  both  job  performance  and 
training  performance  criteria.  Behavioral  consistency  appears  to 
be  the  item  characteristic  of  most  value  in  predicting  an  item’s 
validity,  especially  for  job  performance  criteria,  but  both 
heterogeneity  and  behavioral  discreteness  also  attained 
respectable  correlations  with  the  item  validities. 

The  direction  of  the  relationship  between  characteristics  and 
validities  is  as  hypothesized  for  behavioral  discreteness  and 
behavioral  consistency,  suggesting  that  items  that  are  both 
behaviorally  discrete  and  consistent  with  criterion  behavior  are 
likely  to  yield  the  best  validities.  For  heterogeneity,  the 
relationship  is  opposite  to  that  hypothesized,  suggesting  that 
items  that  are  less  heterogeneous  are  more  likely  to  produce 
validity.  This  finding  is  interesting  since  heterogeneity  was 
examined  at  the  item  level  rather  than  at  the  scale  or  inventory 
level  as  is  more  traditional  in  research.  Thus,  it  may  be  that 
heterogeneity  is  still  desirable  in  a  biodata  instrument,  but  that 


466 


such  heterogeoeity  is  best  tttsiaed  by  combining  items  that  are 
themselves  somewhat  homogeneous.  The  ratings  of  heterogeneity 
were  also  noticeably  less  reliable  than  for  the  other 
characteristics,  and  while  this  should  not  affect  the  direction  of 
its  relationship  with  validities,  the  heterogeneity  characteristic 
may  be  the  most  difficult  of  the  characteristics  to  interpret. 

It  is  interesting  that  the  relationship  between 
characteristics  and  validities  is  notably  stronger  in  the 
administrative  specialist  job  than  it  is  across  jobs.  This  is 
also  true  for  the  job  performance  criteria  of  Core  Technical 
Proficiency  and  General  Task  Proficiency.  A  possible  explanation 
for  the  administrative  specialist  finding  is  that  the  validities 

for  this  job  were  obtained  in  a  sample  that  was  the  largest  of  the 

jobs  included  (N  >  268  for  job  performance  criteria  and  N  •  2260 
for  training  criteria).  The  sample  sizes  in  the  other"  jobs  are 

all  at  least  reasonably  large,  however,  so  it  appears  other 

factors  are  also  involved.  For  the  job  performance  criteria,  the 
two  dimensions  that  are  well  predicted  are  both  referred  to  as 
”can*do"  criteria,  while  the  other  dimensions  are  ”will>do” 
criteria.  Again,  however,  it  is  not  clear  why  this  finding  should 
be  obtained.  Future  research  to  extend  the  findings  of  this 
investigation  should  address  the  stability  of  characteristic- 
validity  relationships  across  both  jobs  and  criteria. 

Because  this  research  is  the  first  investigation  attempted  of 


467 


biodttt  item  characteristics  aad  their  relationship  to  validity, 
results  must  be  viewed  with  caution.  Only  one  set  of  biodata 
items  (aimed  primarily  at  young  adults)  was  included  and  the 
eriterion*related  validities  were  gathered  in  the  military  rather 
than  in  an  industrial  organization.  Nevertheless,  several 
strengths  of  the  investigation  suggest  the  findings  may  be 
relatively  stable  in  future  research. 

First,  the  item  set  employed  is  highly  diverse,  which  should 
contribute  to  an  effective  test  of  the  rated  characteristics. 

Second,  the  level  of  validities  attained  by  the  items  is 
respectable,  and  even  more  important,  the  validities  have 
considerable  variance.  The  validities  were  obtained  in  large 
"real'world*  samples,  using  criteria  that  had  been  carefully 

developed.  Finally,  the  results  obtained  are  highly  consistent 

for  two  independent  groups  of  raters  and  are  also  consistent  for 
two  independently  gathered  types  of  criteria  that  are  conceptually 
and  methodologically  distinct. 

Results  from  this  research  can  be  examined  from  both 
theoretical  and  applied  perspectives.  From  the  theoretical 
perspective,  this  investigation  suggests  conceptual  reasons  that 
may  underlie  the  predictive  ability  of  biographical  measures. 
Item  characteristics  examined  in  the  research  are  more  often 
characteristics  of  biodata  items  than  of  items  from  related 
domains;  they  may  therefore  be  in  part  responsible  for  the 


468 


superiority  of  biodata  prediction  to  that  from  other  domains.  The 
investigation’s  findings  also  eomplement  more  content-related 
biodata  research,  such  as  that  addressing  biodata  factors  like 
academic  achievement  or  early  home  environment.  A  taxonomy 
combining  both  the  characteristics  examined  in  this  research  and 
content-related  aspects  of  biodata  measures  may  be  of  great  value 
in  improving  conceptual  understanding  of  the  biodata  domain. 

From  the  applied  perspective,  findings  of  the  investigation 
address  the  optimal  construction  procedures  for  a  biodata 
instrument.  Items  developed  with  attention  to  the  characteristics 
studied  may  produce  higher  validities,  an  outcome  of  obvious 
applied  value.  This  value  may  be  increased  further  through 
combination  of  content  considerations  with  the  characteristics 
examined  here. 

A  conference  in  1965  of  the  nation’s  leading  biodata 
researchers  concluded: 

Aside  from  theoretical  academic  interest,  there 
were  no  very  persuasive  reasons  for  tackling  the 
(biodata  conceptual)  problem  until  a  ’prediction 
plateau’  developed.  It  seems  apparent  now  that 
increased  efficiency  will  occur  only  when  we  learn  more 
about  the  causal  relationships  underlying  predictive 
items  (Henry,  1966,  p.  248). 


469 


Future  resetreh  to  replicate  and  extend  both  theoretical  and 
practical  aspects  of  this  investigation  will  hopefully  be  of  value 
in  contributing  to  this  continuing  goal. 


470 


ea  for  Judcnnents  of  Biodata  Itea  Character 


471 


472 


Bdvivloral  Ocsnsistency  .19*  .26**  .20*  .00  .32** 


and  Vallditia:  JCb 


473 


Behavioral  Ocsnslstenq^  .19*  .28**  .12  -.06  .  24* 


474 


Behavioral  ODnslstency  .01  .16  .13  *  00  .  39** 


Oorrelatlon  Between  Studtent  Raters  ajxl  Validities;  Job  Perfo: 


475 


Behavioral  Oonslstency  .03  .13  -.05  -.13  .  30** 


Table  6 


Average  Correlation  Across  Criterion  Measures  Between  Expert  Ratings  and 
Validities;  Training  Performance 


Radio 

Armor 

Vehicle 

Administrative 

Teletype 

Crewman 

Operator 

Special i St 

N  -  726 

N  -  1642 

N  -  1076 

N  -  2260 

Heterogeneity 

<M 

CM 

• 

1 

-.11 

-.26** 

-.29** 

Behavioral  Discreteness 

.09 

.13 

.31** 

.35** 

Behavioral  Consistency 

.12 

.12 

.28** 

.31** 

Table  7 


Radio 

Armor 

Vehicle 

Administrative 

Teletype 

Crewman 

Operator 

Specialist 

N  -  726 

N  -  1642 

N  -  1076 

N  «  2260 

Heterogeneity 

-.13 

-.10 

-.17 

-.26** 

Behavioral  Discreteness 

.08 

.13 

.21* 

.36** 

Behavioral  Consistency 

.10 

.12 

.29** 

.27** 

**  p  <  .01 

‘  *  p  <  .05 


476 


BtfftrMCts 


Barge,  B.  N..  &  Hough,  L.  M.  (1984).  Utility  si  biPmttiliCJll 
data:  A  review  gjul  intearation  s£  ill£  lltCrBlttK. 

Minneapolis,  MN:  Personnel  Decisions  Research  Institute. 

Campbell,  J.  P.  (Ed.).  (1987).  Imorovina  HuL  iclfiCtion* 

classification,  gjul  utilization  si  Army  tnliStCd  PCfSOnncU 
Annual  Report.  1986  fiscal  year.  (HumRRO  IR-PRD-87-10)  (ARI 
Technical  Report  in  preparation). 

Dayis,  K.  R.  (1984).  A  longitudinal  analysis  of  biographical 

subgroups  using  Owens*  deyelopmental*integratiye  model. 
Personnel  Psycholoay.  1-14. 

Eaton.  N.  K.  (1984,  May).  The  12*  S*  Army  research  project  is 
improve  selection  gsil  classification  dccisions-  Paper 
presented  at  the  National  Security  Industrial  Association 
Conference  on  Personnel  and  Training  Factors  in  Systems 
Effectiveness,  Springfield,  VA. 

Eberhardt.  B.  J.,  St  Muchinsky,  P.  M.  (1982a).  An  empirical 
investigation  of  the  factor  stability  of  Owens*  Biographical 
Questionnaire.  Journal  si  Applied  Psvcholoav.  62,  138-14S. 

Eberhardt,  B.  J..  St  Muchinsky,  P.  M.  (1982b).  Biodata 
determinants  of  vocational  typology:  An  integration  of  two 
paradigms.  Journal  si  Applied  Psvcholoav.  62,  714-727. 

Eberhardt,  B.  J.,  St  Muchinsky,  P.  M.  (1984).  Structural 
validation  of  Holland*s  hexagonal  model:  Vocational 
classification  through  the  use  of  biodata.  Journal  si 
Applied  Pavcholocv.  62,  174-181. 

Gbiselli,  E.  E.  (19SS).  XJiS  measurement  q£  occupational 
aptitude.  Berkeley,  CA:  University  of  California  Press. 


477 


Henry.  E.  R.  (Chrmn.)  (1966).  Research  conference  flu  i2i£  jjl 
autobiographical  data  ai  psychological  PrcdiCtPrS- 
Greensboro.  NC:  The  Creativity  Research  Institute.  The 
Richardson  Foundation. 

Lautenschlager,  G.  J.,  A  Shaffer,  G.  S.  (1987).  Reexamining  the 
component  stability  of  Owens*  Biographical  Questionnaire. 
Journal  &£  Applied  Psychology.  22.  149-1S2. 

Matteson.  M.  T.  (1978).  An  alternative  approach  to  using 
biographical  data  for  predicting  job  success.  Journal  of 

QCCMPational  Psychology.  ;LL  lSS-162. 

Mitchell,  T.  W.,  A  Klimoski,  R.  J.  (1982).  Is  it  rational  to  be 
empirical?  A  test  of  methods  for  scoring  biographical  data. 
iQUmal  QL  Applied  Psychology.  6L  411-418. 

Owens,  W.  A.  (1976).  Background  data.  In  M.  D.  Dunnette  (Ed.), 
Hand-book  OC  industrial  gjiil  Organigational  Psychology. 
Chicago:  Rand  McNally. 

Owens,  W.  A.,  A  Schoenfeldt,  L.  F.  (1979).  Towards  a 

classification  of  persons.  Journal  &£  Applied  Psychology 
Monograph.  64.  669-607. 

Pannone,  R.  D.  (1984).  Predicting  test  performance:  A  content 

valid  approach  to  screening  applicants.  Personnel 

Psychology.  2L  507-S14. 

Reilly,  R.  R.,  A  Chao,  G.  T.  (1982).  Validity  and  fairness  of 
some  alternative  employee  selection  procedures.  Personnel 
Psychology.  2L  1-62. 


478 


A  TASK-BASED  APPROACH  FOR  IDENTIFYING 


JUNIOR  NONCOMMISSIONED  OFFICERS'  KEY  RESPONSIBILITIES 


Ilene  F.  Gast 

U.S.  Army  Research  Institute 

Charlotte  H.  Campbell 
Human  Resources  Research  Organization 

Alma  G.  Steinberg 
U.S.  Army  Research  Institute 

Daniel  A.  McGarvey 
American  Institutes  for  Research 


Presented  on  Symposium, 

"Junior  Noncommissioned  Officer  Job  Requirements: 
Where  Does  Leadership  Fit  In?" 

At  the  Annual  Convention  of  the 
American  Psychological  Association 
New  York 

August  1987 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


479 


This  paper  describes  researdi  perforaed  under  Project  A:  laproving  the 
Selection,  Classification,  and  utilization  of  Arsy  Eialisted  Personnel.  This 
program  is  designed  to  provide  the  information  and  procedures  required  to 
meet  the  military  manpower  challenge  of  the  future  by  enabling  the  Am^  to 
enlist,  allocate  and  retain  the  most  qualified  soldiers.  The  research  is 
funded  primarily  by  Amy  Project  Number  2Q263731A792  and  is  conducted  under 
the  direction  of  the  U.S.  Any  Research  Institute  for  the  Behavioral  and 
Social  Sciences.  Research  scientists  from  the  U.S.  Any  Researdi  Institute 
for  the  Behavioral  and  Social  Sciences,  the  Human  Resources  Research  Organi¬ 
zation,  the  American  Institutes  for  Research,  and  the  Personnel  Decisions 
Research  Institute  as  well  as  many  Any  officers  and  enlisted  persmnel  are 
participating  in  this  effort. 

This  research  was  funded  by  the  U.S.  Any  Research  institute  for  the 
Behavioral  and  Social  Sciences,  Contract  No.  MCA903-82-C-0531 .  All  state¬ 
ments  expressed  in  this  paper  are  those  of  the  authors  and  do  not  necessarily 
express  the  official  opinions  or  policies  of  the  U.S.  Any  Research  Institute 
or  the  Department  of  the  Any. 


480 


A  Task-Based  Approach  for  Identifying 
Junior  NOOs'  Key  Responsibilities 

Introduction 

Project  A,  laproving  the  Selection  and  Classification  of  Amy  Enlisted 
Personnel,  has  begun  to  address  the  prediction  of  long  range  criteria.  One 
of  the  challenges  faced  ky  project  personnel  is  the  develqfieent  of 
apprc^riate  neasures  of  second  tour  job  knowledge  and  hands-on  performance. 

As  soldiers  enter  their  second  tour  of  duty,  their  jobs  diange  dramatically. 
During  their  second  tour,  soldiers  begin  assuming  supervisory  responsi¬ 
bilities,  %ihile  also  retaining  their  technical  duties.  As  Dr.  Rumsey 
mentioned,  unlike  technical  tasks,  many  of  these  new  supervisory  activities 
cannot  be  translated  into  discrete,  proceduralizable  tasks  and  therefore  are 
not  amenable  to  the  same  kinds  of  job  analytic  procedures.  For  similar 

f 

reasons,  the  measurement  strategies  developed  for  the  technical  first-tour 
tasks,  hands-on  and  job  knowledge  tests  covering  specific  tasks,  would 
capture  second  tour  performance  insufficiently.  For  these  reascms,  we 
designed  a  job  analytic  approach  incorporating  multiple  data  sources  to 
determine  the  appropriate  mix  of  supervisory  and  technical  skills,  ensure 
adequate  coverage  of  both  domains,  and  provide  insist  into  suitable  measure¬ 
ment  procedures. 

Today,  I  %rill  describe  how  we  integrated  data  from  three  sources  to 
develop  a  comprehensive  performance  domain  for  each  of  nine  Amy  jobs.  I 
will  then  describe  some  general  differences  we  found  in  the  composition  of 
first  and  second  tour  jobs,  in  addition,  I  %#ill  discuss  any  specific 


481 


differences  anong  the  selected  Aay  j(±)B  in  the  ia(>ortance  of  supervisory 
activities.  Finally,  I  will  address  practical  iaplications  for  the 
develcpnent  of  "task-based”  second  tour  perforaance  aeasures. 

Method 

Developing  the  Task  PoBiains 

Our  first  step  %«as  assenbling  a  coeprehensive  list  of  jdt>-specific  tasks 
within  each  of  nine  Ara^  occupations.  Included  %«ere:  ixifantrynan  (IIB), 
cannon  crewaenber  (13B),  tank  crewaenter  (19E),  single-^iannel  radio  operator 
(310,  li^t  %iheel  vehicle  aechanic  (63B),  aotor  transport  operator 
(64C/B8H),  administrative  specialist  (71L),  medical  specialist  (91A/B),  and 
military  police  (95B).  Three  separate  job  analytic  procedures  were  employed 
to  generate  tasks  to  be  included  in  this  list.  Brief  descriptions  of  each 
are  provided  in  the  following  secticm. 

Technical  Cceponents  of  the  Junior  NOO  Jc^ 

I'll  begin  by  discussing  the  process  we  used  to  describe  the  tedinical 
portion  of  jwiior  non-cconissioned  officers'  (NCOS')  jc^.  For  eadi  of  the 
nine  jobs  being  st\x3ied,  definiticm  of  the  junior  NCO  job  domain  began  tath 
the  Soldier's  Nanijals  for  the  job.  Soldier's  Manuals  are  pr^red  by  Amy 
agencies  for  every  j(^  and  for  every  skill  level  within  the  job.  They  not 
only  list  the  tasks  required,  but  also  the  conditions  under  which  they  are 
performed,  the  steps  required  for  performance,  and  the  performance  standards. 
The  Army  also  expects  soldiers  to  be  proficient  on  the  tasks  in  the  Soldiers' 
Manual  of  Cannon  Tasks,  %«hich  likewise  includes  tasks,  conditions,  steps,  and 
standards  for  basic  soldiering  tasks  at  each  skill  level  (tasks  su^  as  map 


482 


reading,  basic  first  aid,  and  operation  of  individual  %ieapans).  The  junior 
NOO,  «dio  is  a  Skill  Level  2  soldier,  is  held  responsible  for  all  Skill  Level 
1  and  2  tasks  in  both  the  job-specific  and  cosMon  task  aonuals. 

we  also  used  data  frcn  the  Am^  Ocaq>atianal  Survey  Programs  (AOSPs)  in 
defining  the  jc^  domains.  These  surveys,  vihich  list  hundreds  of  task  statem¬ 
ents  for  eadi  jt^,  are  administered  periodically  to  represent  samples  of 
soldiers  at  every  skill  level  of  each  job;  analyses  of  the  data  include  the 
percent  of  soldiers  at  each  skill  level  who  r^»rt  that  they  perform  the 
tasks.  The  list  %ns  screened  to  eliminate  statements  not  performed  by  Skill 
Level  2  soldiers,  and  the  surviving  AOSP  statements  %iere  then  mapped  on  to 
the  tasks  list  defined  by  reference  to  the  Soldier's  Manuals.  Any  AOSP 
statements  that  were  not  thus  subsumed  under  Soldier's  Manual  tasks  were 
added  to  the  domains.  In  so  doing,  we  often  found  that  higher  skill  level 
tasks  were  performed  by  significant  numbers  of  Skill  Level  2  soldiers,  and 
were  therefore  considered  to  be  a  part  of  the  job  domain  in  hand. 

With  this  domain  list  in  hand,  we  visited  the  Any  agencies  responsible 
for  training  and  doctrine  in  each  job,  and  requested  their  review  of  the 
list.  We  asked  for  their  iiqjut  concerning  the  completeness  and  accuracy  of 
the  list,  and  also  found  out  from  them  whether  any  of  the  tasks  were  likely 
to  be  eliminated  soon  because  of  equipment  or  doctrine  dianges,  or  whether 
other  tasks  should  be  added  for  similar  reasons.  After  they  had  completed 
their  review  and  given  their  concurrence  on  the  doctrinal  accuracy  of  the 
domain,  %ie  considered  domain  definition  of  the  tedmical  tasks  cosplete. 

(The  process  parallels  that  used  in  defining  the  Skill  Level  1  domains,  for 


483 


only  Skill  Level  1  Soldier's  Manuals  %iere  used  in  the  initial  step. 
Details  nay  be  found  in  Caafibell,  Caafibell,  Runsey  &  Edwards,  1985). 
Supervisory  Cotxanents  of  The  Junior  NCO  Job 

In  developing  the  stifwrvisory  portion  of  the  job  doeiains,  we  took 
advantage  of  existing  research  on  Ara^  supervision  and  leadership,  ivro 
separate  approaches  %fere  incorporated  to  fom  the  si^ervisory  coeponent  of 
the  jd3  donains.  The  first  approadi  yielded  the  Supervisory  Responsibility 
Questionnaire  (SRQ).  It  was  based  on  critical  incidents  describing  working 
relatimships  bettieen  first  term  soldiers  and  their  NOD  supervisors.  The 
second  approadi  was  an  adaptation  of  the  Leader  Requirements  Survey  (IRS),  an 
extensive  interview^based  task  list.  By  combining  these  two  instruments  we 
were  able  to  take  advantage  of  two  jcP  analytic  tedniques  and  two  different 
slices  of  the  supervisory  domain.  Next,  I  will  discuss  the  development  of 
the  SRQ  and  LRS  and  describe  how  they  were  combined  to  define  the  second  tour 
sipervisory  doemin. 

The  Supervisory  Responsibility  Questionnaire  (SRQ);  A  Behavioral 
Example  Based  Task  list.  The  SRQ  was  the  byproduct  of  previous  Project  A 
researd)  whidi  examined  possible  moderating  effects  of  supervision  on  the 
relationship  between  soldiers'  pre-enlistment  attributes  (such  as  iptitude 
and  temperament)  and  their  job  performance  (White,  Cast  &  Ruamey,  1985; 

Bou^,  Cast  &  White,  1986).  As  part  of  this  research,  critical  incidents 
had  been  collected  in  order  to  determine  what  supervisory  behaviors  made  a 
difference  to  the  performance  of  first  tour  soldiers. 


484 


The  incidents  were  %fritten  by  60  NOO  subject  natter  es^rts  (SHEs)  from 
five  of  the  Project  A  target  Jobs  (IIB,  19E,  31C/  63B,  and  91A).  The  SHEs 
were  asked  to  recall  fron  their  eiqnriences  as  first  tour  soldiers  exaaples 
of  how  their  si^rvisors  had  been  particularly  effective  or  particularly 
ineffective.  In  all  the  SHEs  generated  over  400  behavioral  exaaples.  Next, 
a  retranslation  %ias  conducted  in  whidi  a  second  groiqp  of  31  SHEs,  who  were 
familiar  with  Aray  leadership  requirements,  %iere  asked  to  classify  the 
exaaples  into  TUkl*s  13-diaension  taxonosy  of  supervisory  behavior  (Yukl, 
1984).  At  the  sane  time,  these  exaaples  %iere  classified  by  two  ARI  staff 
psychologists.  As  a  result  of  retranslation,  9  of  the  13  YUkl  dimensions 
remained.  (See  Table  1) 

The  SRQ  was  constructed  from  a  subset  of  these  incidents  and  their 
respective  categories.  First,  all  incidents  that  were  not  categorized  into  a 
single  dimension  by  both  SHE's  and  psychologists  were  eliminated  from 
consideration  as  were  multiple  incidents  referring  to  a  tingle  task  or 
behavior.  The  incident  list  was  further  reduced  by  excluding  incidents  not 
describing  a  specific  task  (e.g.,  "The  soldier  fell  asleep  idtile  on  guard 
duty.  [The  supervisor]  walked  up  to  the  sleeping  soldier  and  scared  him.")  In 
the  end,  a  total  of  34  behavioral  statements  were  written  to  represent  8  of 
Thkl's  original  categories.  No  statements  were  written  for  one  category.  Act 
as  Bole  Hodel,  because  the  incidents  groined  under  that  category  were  not 
rich  enou^  to  extract  critical  si^ervisory  tasks. 

One  interesting  facet  of  the  SBQ  is  the  use  of  critical  incidents  as  the 
basis  for  task  statements,  which  is  not  typical.  The  critical  incident 


485 


tedmique,  a  behavioral  job  analytic  procedure,  is  typically  used  for 
developii^  broader  behavioral  diaensions.  (See  Pulakos,  Hanson,  Boraan, 
Hallaa,  Carter  &  Owens-Kurtz,  1987). 

Because  the  incidents  conprising  the  SBQ  tasks  were  priaarily  concerned 
with  relationships  between  supervisors  and  their  subordinates,  the  SBQ  had 
built-in  liaitations  in  its  coverage  of  the  supervisory  doaain.  To  ensure 
all  iaportant  aspects  of  the  supervisory  doaain  «iere  included,  a  suppleaental 
task  list  was  needed. 

The  Leader  Requireaents  Survey  (LBS)  Interview-based  Task  list.  The 
second  approach  incorporated  the  Leader  Bequireaents  Survey  (LBS),  %ihich  %fas 
originally  designed  to  provide  the  Aray's  proponents  for  leadership  with 
inforaation  about  the  leadership  job  requireaents  of  Amy  coaaissioned  and 
non-coaaissioned  officers.  The  LBS  %ias  designed  to  identify  the  sequential 
and  progressive  nature  of  coaaissioned  and  noncoeadasioned  officer  leadership 
(second  lieutenant  through  colonel,  and  sergeant  through  coaaand  sergeant 
aajor),  and  contains  iteas  whidi  cover  the  leadership  doaain  of  all  these 
organizational  levels.  In  addition,  it  eabodies  the  full  range  of  leadership 
tasks  across  all  Amy  brandies. 

This  task  list  was  constructed  throu^  an  iterative  interview  stra¬ 
tegy.  Several  hundred  interviews  were  conducted.  Typically,  6-8  SHEs  were 
interviewed  at  a  tiae  and  interviews  lasted  for  approxiaately  90  ainutes. 
Interviewees  were  asked  to  describe  their  job,  focussing  particularly  on  what 
they  did  to  influence  others  to  accoaplish  their  aission  (i.e.,  the  Amy 
definition  of  ailitary  leadership  as  docuaanted  in  FH  22-100)  and  especially 


486 


those  leadership  tasks  that  differentiated  their  jobs  from  those  perfomed  by 
others  in  higher  or  lower  ranks  than  themselves. 

In  order  to  ensure  that  the  resulting  task  list  both  coapletely 
encompassed  the  domain  of  military  leadership  and  %«as  worded  in  terms 
commonly  esployed  by  job  incumbents,  each  successive  group  of  SNEs  %ims  shown 
the  leadership  tasks  develc^ied  by  the  previous  groi^  and  asked  to  comment  on 
these  tasks.  These  iterative  interviews  %iere  conducted  until  new  groups  of 
SNEs  no  longer  added  new  tasks  and  were  comfortable  %dth  the 
wording  of  tasks  already  collected. 

Content  validation  of  the  task  list  %<as  achieved  through  reviews  by  two 
separate  groiqps  of  Army  leadership  proponents.  The  Center  for  Army  Leadership 
(CAL)  and  The  U.  S.  Amy  Sergeants  Major  Acadeny  (USASNA).  Consensus  on  the 
final  list  of  tasks  comprising  the  LR5  was  reached  by  a  review  committee 
consisting  of  representatives  from  CAL,  USASHA  and  ARI.  The  resulting  Leader 
Requirements  Task  List  contains  tasks  in  the  following  broad  categories: 
Train,  Teach,  and  Develop  (146  tasks);  Motivate  (170  tasks);  Manage  (86 
tasks);  and  Provide  Direction  (158  tasks)  for  a  total  of  560  tasks.  Table  2 
lists  the  maaber  of  tasks  corresponding  to  each  content  area.  In  the  present 
research,  25  of  the  tasks  in  the  category  "Provide  Direction",  coming  xinder 
the  sub-lading  of  "Provide  In^ait  for  the  Direction  of  the  Larger  Organi¬ 
sation"  %iere  dropped  because  they  contained  tasks  performed  by  hi^r-level 
commissioned  officers  only.  (See  Steinberg,  1986,  and  Steinberg,  van  Rijn  & 
Bunter,  1986  for  more  information  on  the  LRS).  A  listing  of  number  of  tasks 
by  content  area  can  be  found  in  Table  2. 


The  Supervisory  Responsibility  Questionnaire  gave  us  inforaation  on 
working  relationships  between  first-line  supervisors  and  their  subordinates, 
as  perceived  by  the  subordinates.  Ibe  such  longer  Leader  Requirements  Survey 
included  activities  involving  peers  and  superiors,  as  well  as  administrative 
duties,  but  was  designed  to  cover  these  activities  across  all  si^ervisory 
levels  within  all  Army  brandies,  from  junior  NCOS  through  full  Colonels.  In 
order  to  determine  %Aiich  of  the  activities  on  the  IKS  should  be  a  part  of  the 
domains  for  junior  NCOS,  and  to  verify  the  tasks  on  the  SRQ  as  appropriate 
tasks  for  the  domains,  both  questionnaires  were  administered  to  NCOS  in  the 
nine  jobs.  Approximately  50  NCOs  received  the  LRS,  and  125  NCOs  received  the 
SRQ.  For  each  questionnaire,  the  NCOs  were  asked  to  indicate  how  important 
each  task  is  in  performance  of  the  E5's^  job;  on  the  SRQ,  they  were  also 
asked  to  indicate  how  frequently  eadi  task  is  performed. 

Analysis  of  the  SRQ  data  confirmed  that  all  the  tasks  were  sufficiently 
iaportant,  across  a  variety  of  Amy  jobs,  to  be  retained  as  part  of  the 
junior  NCO  domain.  The  LRS  importance  data  were  used  to  select  tasks  that 
over  half  of  the  respondents  indicated  were  absolutely  essential  to  the  E5's 
job.  Additional  highly  rated  tasks  were  also  selected  from  aiy  of  the  19  LRS 

^E5  is  the  Anqr  paygrade  at  which  Army  doctrine  specifies  that  soldiers 
become  noncommissioned  officers  and  can  assume  supervisory  re^xxisibilities. 
E5  is  also  the  first  paygrade  at  idiidi  a  soldier  is  classified  as  Skill  Level 
2.  Ne  were  particularly  concerned  %rith  the  jobs  of  E5  soldiers  because  it 
was  projected  that  E5  would  be  the  most  common  paygrade  within  our  second 
tour  Pr  ct  A  sasple,  and  thus  was  designated  as  the  target  group  for  «Aiom 
we  %iould  develop  our  measures. 


diaensions  not  already  represented  by  at  least  two  tasks.  Ultiaately,  two 
LRS  diaensions  were  eliainated  froa  the  doaain  because  they  failed  to  aeet 
the  ii^rtance  criteria.  By  this  process,  53  tasks  were  selected  froa  the 
to  be  considered  for  the  job  doaains. 

Content  analysis  of  the  two  tasks  lists — 34  tasks  froa  the  SBQ  and  53 
tasks  froa  the  lilS  —  resulted  in  a  single  list  of  46  tasks  that 
incorporated  all  of  the  activities  on  both  lists.  Of  those  46,  the  34  tasks 
froa  the  SRQ  were  included  and  6  of  the  UtS  tasks;  4  new  task  statements  were 
prepared  to  cover  two  or  more  LRS  statements  each. 

The  46  Task  statements  viere  further  examined  by  reference  to  the 
categories  used  for  the  original  SRQ.  Eight  categories  evolved  for  the  46 
tasks,  shown  in  Table  3.  These  tasks,  cliistered  as  shown,  were  added  to  the 
Skill  Level  2  job  domain  for  each  of  the  nine  jc^s. 

Refining  the  Job  Doaains 

After  the  jc^  doaains  were  thus  defined,  every  doaain  included  over  200 
tasks.  We  wanted  to  select  smaller  samples  of  tasks  to  represent  each  of  the 
doaains,  samples  that  would  include  the  most  critical  tasks  for  the  jobs  and 
that  would  have  a  sufficient  range  of  performance  difficulty  to  permit  some 
discrimination  among  soldiers.  In  order  to  do  this,  more  information  was 
needed;  specifically,  we  needed  jtjdgments  of  task  criticality  and  performance 
difficulty. 

The  Aray  agency  responsible  for  each  jdb  was  asked  to  designate  30  j<^ 
e;q)erts:  officers  or  NCOS  in  that  military  specialty  %dio  had  recent  field 
experience  supei vising  E5s  in  the  job.  Half  of  the  job  experts  rated  the 


489 


tasks  for  a  hypothetical  E5  soldier  who  had  between  three  and  five  years  of 
service;  half  were  given  another  task  not  directly  related  to  the  topic  tinder 
discussion  today. 

For  the  in^rtance  judgments,  the  experts  %«ere  given  one  of  three 
scenarios,  and  asked  to  rate  (on  a  5-point  scale)  the  ixaportance  of  the  task 
in  acccBfilishing  the  unit's  mission  under  that  scenario.  The  three  scenarios 
described  either  combat  conditions  (European,  non-nuclear),  increasing 
tensions  (Eurc^an,  %d.th  a  hi^  state  of  training  and  strategic  readiness, 
but  short  of  actual  conflict),  or  a  garrison  environment  (stateside,  with 
training  as  the  primary  activity  and  mission).  In  all,  %«e  collected  10 
ratings  for  each  paygrade/scenario  coehination,  for  a  total  of  30  sets  of 
ratings  per  job.  The  importance  ratings  were  averaged  across  the  10  experts 
in  each  rating  condition  to  yield  3  isfxirtance  scores.  TO  (^tain  an  indica¬ 
tion  of  expected  task  performance  distribution,  the  experts  were  asked  to 
sort  a  "typical"  groiqp  of  ten  hypothetical  soldiers  into  five  performance 
categories  based  on  how  they  would  expect  soldiers  to  be  able  to  perform  on 
each  task.  Task  difficulty  was  then  cooputed  as  the  mean  of  the 
distribution  of  the  ten  soldiers,  averaged  across  the  experts.  Task 
performance  variability  was  computed  as  the  standard  deviation  of  the 
distribution  of  the  ten  soldiers,  averaged  across  experts.  (This  procedure, 
for  both  importance  and  difficulty  ratings,  was  developed  and  used  for  the 
Skill  Level  1  job  analysis,  and  is  described  in  more  detail  in  Caapbell  et 
al.,  1985.) 


490 


Selecting  Tasks  for  Measurement 

The  last  step  before  designing  hands-on  and  job  knowledge  aeasures  for 
each  of  the  jdt>  domains  is  selecting  a  subgroi^}  of  tasks  for  measurement. 

Even  as  we  speak,  that  task  selection  process  is  taking  place.  The  Any 
agencies  for  each  job  have  again  been  asked  to  provide  six  job  experts  with 
recent  field  experience;  one  Project  A  staff  member  %d.Il  also  serve  on  the 
task  selection  panel.  The  information  to  be  presented  for  their  consider¬ 
ation  includes  the  tasks,  cltistered;  the  importance  rating  for  each  task  for 
E5s;  the  performance  difficulty  and  variability  for  each  tasks  for  E5s;  and 
the  performance  frequency  for  each  task,  drawn  from  the  Amy  Occupational 
Survey  Program  analyses.  The  panel  will  eventually  agree  on  45  tasks  to  be 
selected  for  each  job,  30  technical  tasks  and  15  siqwrvisory  tasks.  To  guide 
their  selection  so  that  every  cluster  is  represented,  targets  are  set 
proportionally  for  each  cluster. 

Analyses 

The  analyses  addressed  general  differences  between  first  and  second  tour 
jobs  and  differences  in  supervisory  requirements  across  jdos.  Two  sources  of 
data  were  considered  (1)  the  second  tour  job  analysis  data  described  in  this 
pqper  and  (2)  clusters  derived  from  first  tour  job  analyses  described  by 
Campbell  et  al,  1985. 

Analyses  were  largely  descriptive  because  with  the  exception  of  the  SBQ 
data,  domains  were  not  directly  comparable  between  first  and  second  tour  nor 
across  jc^s.  Further,  these  analyses  %#ere  preliminary;  as  we  move  from  task 
selection  to  task  measurement  we  %irill  be  using  these  data  to  answer  questions 


491 


about  the  best  way  to  capture  perfomance  on  specific  tasks.  In  order  to 
examine  general  differences  between  first  and  second  tour  domains,  we  began 
with  a  general  cooparison  of  the  content  of  these  two  sets  of  domains  for 
each  NOS.  Me  noted  trends  for  changes  in  cluster  coa^sitim,  the  addition 
of  new  clusters,  and  the  deletion  of  existing  clusters  from  first  to  second 
tour.  To  examine  differences  among  the  occqpaticms  in  si^rvisory  responsi¬ 
bilities  we  assessed  job-specific  additions  to  the  core  SRQ  clusters  and 
isportance  ratings  for  each  of  the  augmented  clusters.  Finally,  we  compared 
differences  in  isportance  ratings  across  occupations  for  the  common,  tech¬ 
nical  and  supervisory  cluster  groi^ings.  Our  research  questicxis  follow. 

1.  How  much  overlap  is  there  between  first  and  second  tour  job  dimen¬ 
sions? 

2.  Oo  jobs  increase  in  cooplexity?  What  are  the  indications  based  on 
cooparison  of  first  and  second  tour  domains? 

3.  What  is  the  balance  between  supervisory  and  technical  tasks  in  the 
second  tour?  Does  this  balance  vary  across  NOS? 

4.  In  %diich  job(s)  were  supervisory  activities  judged  to  be  tlie  most 
iaportant?  The  least  important? 

5.  Did  specific  supervisory  activities  differ  in  importance  across  the 

jobs? 

Results 

Changes  in  Domains  from  First  to  Second  Tour;  Cospon  and  Technical  Tasks 

Our  first  step  was  to  examine  changes  in  clusters  of  "Common  Tasks" 
which  are  shared  to  some  extent  across  NOS,  and  non-shared  job-specific 


492 


"tedmical"  tasks.  In  teens  of  the  dichotoo^  presented  by  Dr.  Runsey  between 
si;^rvisory  and  technical  tasks,  both  of  these  would  represent  subsets  of  the 
"technical"  category.  At  the  tine  %<e  prepared  our  analyses,  the  databases 
for  two  of  the  jobs  were  still  under  preparation  at  the  tine  %fe  were  writing 
this  paper.  Table  4  compares  the  first  and  second  tour  job  domiains  across 
seven  jobs  in  terns  of  (1)  the  number  of  clusters  included  and  (2)  the  number 
of  tasks  incltjded.  In  all  but  two  instances  (infantryman  and  motor  trans¬ 
port  operator)  the  nundser  of  clusters  needed  to  describe  the  doemin  increased 
from  first  tour  to  second  tour  jobs.  Moreover,  in  all  cases,  there  %fas  an 
increase  in  the  nunftier  of  tasks  needed  to  describe  each  domain. 

Differences  in  technical  clusters  across  jobs.  Although  first  and 
second  tour  technical  tasks  and  clusters  overlap  considerably,  differences 
between  the  two  sets  of  domains  emerged.  Specifically,  within  the  four  non- 
combat  jobs  (radio  operator,  motor  transport  operator,  administrative 
specialists,  and  medical  specialists),  tasks  in  three  of  the  shared  clusters 
realigned  themselves.  By  contrast,  there  was  little  change  in  the  combat 
jobs,  infantryman,  cannon  crewman  or  tank  crewmendber,  with  the  least  amount 
of  change  for  the  infantry. 

The  addition  of  tasks  also  caused  several  of  the  tedmical  clusters  to 
split  into  better  differentiated  grotps  of  tasks.  A  more  important  change 
was  the  addition  of  an  Operations  (or  Tactical  Sipervision)  category  in  four 
of  the  seven  jehs  (infantry,  tank  crewmember,  radio  operator,  and 
administrative  specialist)  and  expansion  of  that  category  in  a  fifth  (cannon 
crewmendser ) .  Further,  the  two  jobs  (radio  operator  and  medical  specialist) 


493 


not  acquiring  an  Operations  cluster  gained  a  separate  Adtadnistrative  cliister. 
(See  Table  5.) 

Supervision  as  a  Conponent  of  Second  Tour  Job  Dosiains 

In  addition  to  acquiring  new  kinds  of  job  specific  technical  responsi¬ 
bilities,  each  job  acquired  supervisory  duties.  He  nentioned  earlier  that 
the  46-itein  SRQ  %as  appended  to  ea^  donain  prior  to  donain  review.  As  a 
result  of  donain  review,  nany  supervisory  tasks  fron  the  A06P  and  Soldier's 
Manuals  %iere  grouped  under  existing  atQ  clusters.  Table  5  shows  the  nunber 
of  AOSP/Sn  tasks  added  to  each  SRQ  cluster  occupation.  The  bulk  of  the 
tasks  were  added  to  three  of  the  clusters:  (1)  Plan,  Organize,  Monitor;  (2) 
Provide  Information,  and  (3)  Train,  Develop.  The  remaining  five  SRQ  clusters 
remained  fairly  stable  across  occupations.  Within  the  three  most  augmented 
clusters,  new  SRQ  tasks  were  not  evenly  distributed  across  the  different 
jobs.  Four  of  the  occupations  (tank  erewnesdaer,  radio  operator,  admini¬ 
strative  specialist  and  medical  specialist)  gained  proportionally  more  tasks 
than  the  other  occupations.  Thus,  domain  review  served  to  augment  the  SRQ, 
albeit  unevenly. 

Our  next  step  was  to  assess  the  relative  iaportance  of  the  new  super¬ 
visory  responsibilities  for  soldiers  in  E5  paygrade  across  the  various  jc^. 
Past  experience  and  preliminary  analyses  conducted  last  year  suggested  that 
not.  only  will  the  overall  importance  of  supervision  vary  across  jobs,  but 
specific  supervisory  activities  vary  in  importance  across  jobs.  Others 
postulated  that  E5  j<*s  were  largely  technical  and  that  Project  A  need  not  be 
concerned  with  measuring  supervisory  tasks. 


494 


At  the  tiae  this  p^per  ms  prepared,  task  importance  data  were  available 
for  six  of  the  nine  occmpations.  (See  Table  6.)  With  few  exceptions,  the 
data  ran  counter  to  our  expectations.  First,  with  the  exception  of  the 
nedical  specialists,  st^rvisory  activities  were  jixiged  to  be  fairly  ii^por^ 
tant  across  all  jobs  examined.  The  medical  specialists'  responses  were 
consistently  lower;  however,  if  one  were  to  add  a  constant  of  .6  to  all 
dimensicxis  (except  discipline/punish)  one  would  find  the  cluster  means  within 
the  ranges  produced  by  the  other  j^. 

Second,  the  means  presented  in  Table  6  suggest  that  specific  sxjpervisory 
activities  are  not  differentially  important  %rtthin  each  occt^ticxi.  With  one 
striking  exception,  m^mrvisoty  clusters  tended  to  have  similar  ia(x)rtance 
ratings  within  each  job.  Across  the  board,  the  cluster  "Act  as  a  Role  Model" 
was  first  (or  second  in  for  one  job)  in  importance.  Within  that  category 
"leading  by  exanple"  was  the  most  important  task. 

A  third  unanticipated  but  welcome  finding  was  the  relative  balance  in 
the  importance  of  supervisory  and  technical  tasks  across  all  jcA>s.  (Again, 
ratings  for  the  medical  specialists  were  consistently  lower  across  all  facets 
of  the  domain  than  ratings  for  other  occupations.) 

Discussion 

In  many  ways,  second  tour  jobs  are  more  complex  than  their  first  tour 
counterparts.  As  more  tasks  enter  the  jc^  domains,  the  clusters  became  more 
clearly  differentiated.  Not  only  are  soldiers  doing  more  of  the  types 
of  tasks  but  they  are  also  acquiring  new  responsibilities,  particularly  in 
the  areas  of  task-specific  supervision  and  administrative  responsibilities. 


495 


In  addition,  these  new  supervisory  responsibilities  are  considered  to  be 
inportant  within  each  of  the  job  donains. 

The  final  version  of  the  SBQ  %<as  instrunental  in  capturing  these 
supervisory  responsibilities.  Without  it  we  would  have  aiissed  isportant 
facets  of  second  tour  jobs.  The  Soldier's  Manuals  and  AOSP  surveys  %#ould 
have  done  at  best,  an  uneven  job  of  representing  the  si^rvisory  portion  of 
the  domains. 

We  also  found  that  si^rvisory  tasks  were  seen  as  fairly  isportant 
%dthin  all  of  the  jobs.  While  isore  analyses  of  the  data  collected  are  needed 
to  systematically  explore  patterns  of  differences  between  jdtjs,  even  a 
cursory  examination  of  task  sieans  by  occupation  reveals  that  many  of  the 
expected  differences  within  and  between  occupations  did  not  arise.  However, 
«hat  did  emerge  was  a  balance  among  technical  and  supervisory  aspects  of  the 
domain.  Nevertheless,  based  on  our  resists,  it  would  be  difficult  to 
conclude  that  supervisory  activities  are  important  within  some  occiq)ation(s) 
and  not  others.  Further  the  data  provide  little  evidence  that  tasks  were 
differentially  important  in  specific  jobs. 

However,  the  job  analyses  did  not  provide  as  such  information  about  job 
specific  differences  in  supervisory  activities  as  had  been  eiqjected. 
Therefore,  we  will  be  totally  dependent  on  our  next  phase  to  determine  vAich 
aspects  of  supervision  will  be  measured  for  each  NOS.  During  task  selection, 
SHEs  %rill  be  forced  to  prioritize  tasks  with  the  knowledge  that  only  the  tcp 
45  tasks  %rill  be  considered  for  measurement  purposes. 

As  a  final  note,  this  research  is  consistent  %d.th  past  research  in 


496 


leadership  and  supervision.  First,  support  is  provided  for  YUkl's  taxoncny 
of  leadership  (1964).  The  categories  he  pressed  held  across  two  different 
j(^  analytic  techniques  and  across  several  Kca^  occupations.  Second,  the 
inportance  ratings  that  the  SMEs  gave  to  the  "Act  as  Role  Model”  category  are 
consonant  vrith  Bass's  (1965)  transforaational  leadership  theory.  According 
to  Bass,  leaders  %dio  can  draw  on  their  own  infomal  sources  of  power  (e.g., 
being  a  good  aodel)  to  motivate  others  are  more  effective  than  those  %dio  rely 
on  organizational  incentives.  The  ovendielming  agreement  on  the  ii^»rtance 
of  leading  by  example  typifies  this  kind  of  leadership. 


497 


Table  1 


CdBparism  between  Y\ikl*s  (1984)  Categories  of  Supervisory  Behavior  and 
SBQ  Categories 


YUkl's  (1984)  Categories 

1.  Planning  and  Organizing 

2.  Monitoring 

3.  Problem  Solving^ 

4.  Leading  by  Exan{)le 

5.  Recognizing  and  Rewarding 

6.  Training  and  Developing 

7.  Informing 

8.  Delegating/Participation^ 

9.  Si^iporting 

10.  Diaciplining/Pimishing 

11.  Representing* 

12.  Proeioting  Teamwork* 

13.  Clarifying  Roles  and 

fiqpectations 


SRQ  Final  Categories 

1.  Plan,  Organize,  Hcxiitor 

2.  Act  as  Role  Model 

3.  Recognize,  Reward 

4.  Train,  Develop 

5.  Provide  Information 

6.  Support 

7.  Discipline,  Punish 

8.  Clarify  Roles,  Provide  Feedback 


Motes.  *Deleted  from  list  as  a  result  of  retranslation  (Nhite,  Cast  £ 
Ruasey,  1985) 


498 


Table  2 

Leader  Bequireaents  Survey  (LRS);  Nuadaer  of  Tasks  by  Content  Area 


Content  Area 


tAaaber  of  Tasks 


TTtAIN,  TEACH,  ISVELOP 


Train  Soldiers  21 

Teach  Soldiers  18 

Develop  Leaders  21 

Plan  &  Conduct  Training  42 

Train  in  the  Field  to  Enter  Ccebat  44 

NOnVATE 

Hotivate  Others  (The  What)  13 

Motivate  Others  (The  How)  42 

Develop  Unit  Ccdiesion  52 

Reward  &  Discipline  Subordinates  30 

Take  Care  of  Soldiers  33 

MANAGE 

Manage  Resources  40 

Perfornv/Supervise  Adniinistrative  Functions  26 

Coordinate  with  Others  Outsi<te  the  Unit  20 

PROVIIS  DIRECnCN 

Supervise  Others  20 

Maintain  TWo-way  Information  Exchange  with 

Subordinates  21 

Maintain  TWo-May  Information  Exdiange  with 

Superiors  17 

Monitor  and  Evaluate  Performance  38 

Conduct  Counseling  24 

Establish  Direction  of  the  Unit/Element  13 

Provide  Ii^ut  for  the  Direction  of  the  Larger 

Organization  25 

TOTAL  560 


499 


Tttble  3 


Supervisory  Categories  and  Tasks  Derived  Fron  Coafcininq  the  SRQ  and  LRS 


PLAN,  C3RGANXZE,  HONITOR 

Check  tools,  equipment  and  sui^lies  used  subordinates 

Assign  %iork  tasks  to  subordinates 

Check  cm  subordinates  during  task  performance 

Inspect  completed  work 

Conduct  formal  scheduled  inspecticms 

ftonitor  conditicm  of  equipment  and  su{^lies 

Meet  suspense  dates  and  (leadlines 

Motivate  subordinates  in  maintenance 

CLARIFY  ROLES,  PROVIDE  PEEESACK 
Provide  informal  feedback  on  task  performance 
Counsel  subordinates  -  Scheduled  formal  counseling 
Counsel  subordinates  -  unscheduled  formal  counseling 
Mcmitor  military  appearance  and  bearing 

PROVIDE  INFGRHATIGN 
Brief  nswly  assigned  personnel 
Answer  work  related  questicms 
Conduct  meetings 

Brief  subordinates  cm  current  of  future  missions/requirements 


500 


Table  3  (Continued ) 

Supervisory  Categories  and  Tasks  Derived  Troa  Coefcininq  the  SRQ  and  LRS 


Pass  information  down  chain  of  connand 

Notify  subordinates  of  changes  in  plans 

Present  formal  information  briefings 

Provide  positive  ii^t  to  supervisors 

Provide  connander  with  information  on  enen^  situation 

KECOOnZE,  RENARD 
Provide  positive  verbal  feedback 
Reward  soldiers  for  performance 
Recoomend  Soldiers  for  promotion 
Write  recoanendations  for  awards 
Integrate  subordinates  into  the  unit 

TRAIN,  DEVELOP 

Instruct  subordinates  on  task  performance 

Provide  individual  job  training 

Conduct  team  training 

Provide  remedial  instruction 

Encourage  use  of  training  manuals  or  job  aids 

Counsel  subordinates  in  career  planning/personal  development 


501 


Table  3  (Continued) 

Supervisory  Categories  and  Tasks  Derived  Troa  Conbininq  the  SRQ  and  LRS 


Develop  training  plans 

Provide  qpportunities  for  leadership 


SUPPCXtT 

Listen  to  sxibordinates'  personal  problems 
Counsel  sxibordinates  on  personal  problems 
Arrange  assistance  for  personal  problems 
Assist  soldiers  vith  personal  problems 

DISCIPLINE,  PUNISH 

Issue  verbal  reprimands 

Counsel  subordinates  with  disciplinary  problems 
Arrange  for  extra  training  and  disciplinary/corrective  action 
Beconnend  judicial  or  non^judicial  actim  to  the  coanander 
Besolve  disputes  among  subordinates 

ACT  AS  A  MODEL 

Set  the  example 

Bemain  with  the  assigned  unit  or  element  under  adverse  or  wartime  conditions 
Share  subordinates  hardship 


502 


Table  4 

fffirp»rison  of  First  and  Second  Tour  Domains  by  Number  of  Tasks  and  Clxisters 


Occupation 

Number 

First 

Tour 

of  Tasks 

Second 

Tour 

Number  of 

First 

Tour 

Dimensions 

Second 

Tour 

Infantryman  (IIB) 

221* 

246 

12 

11 

Cannon  Crewmember  (13B) 

177 

225 

11 

14 

Tank  Crewmember  (19E) 

227 

290 

11 

13 

Single  Channel 

Badio  Operator  (31C) 

170 

209 

11 

13 

Motor  Transport 

Operator  (64C/88M) 

119 

150 

12 

12 

Admin.  Specialist  (71L) 

161 

183 

9 

12 

Medical  Specialist  (9IA/B) 

239 

299 

10 

12 

Note.  The  46-itein  SRQ  and  its  8  dinensions  have  not  been  included  in  this 
table. 

*Ihis  includes  14  tasks  from  a  dimension  which  was  dropped  for  second  tour 
soldiers. 


503 


Table  5 


New  Second  Tour  Bcsponsibilities  by  Aray  Job;  Number  of  tasks  in  Clusters 


Doroain  Cluster  Type/Name 

IIB 

13B 

19E 

Occupation 

31C  e6M 

71L 

91A 

Technical 

Operations 

22b 

12 

16 

0 

3 

3 

0 

Administration 

0 

0 

0 

29 

0 

6= 

33 

Supervisory 

N* 

Plan,  Organize,  Monitor 

8 

8 

8 

22 

18 

12 

14 

13 

Clarify  Roles,  Provide 

Feedback 

4 

4 

4 

4 

5 

4 

5 

5 

Provide  Information 

9 

13 

10 

15 

9 

9 

16 

12 

Recognize,  Reward 

5 

5 

5 

5 

6 

5 

6 

7 

Train  Develop 

6 

11 

17 

11 

11 

9 

11 

13 

Support 

4 

4 

5 

4 

4 

4 

6 

6 

Discipline,  Punish 

5 

5 

5 

5 

5 

5 

5 

8 

Act  as  Role  Model 

3 

3 

3 

3 

3 

3 

3 

3 

Total 

46 

53 

57 

69 

61 

51 

66 

67 

Note,  ^lunber  of  tasks  per  SRQ  category.  ^Augnented  category.  Stiis 
category  %ms  larger  for  first  tour.  Six  AOSP  '  ~:ks  originally  in  this 
category  %«ere  reassigned  to  Supervisory  categories. 


504 


OCCUPATION 


IIB  13B  19E  31C  88M  91A 


Supervisory  Category 


Plan,  Organise,  Monitor 

4.07 

3.66 

3.78 

4.00 

3.90 

3.42 

Clarify  Roles 

3.25 

3.58 

4.10 

3.98 

4.09 

3.10 

Provide  Information 

4.07 

3.56 

3.78 

4.18 

4.16 

3.46 

Recognize,  Reward 

3.66 

3.66 

4.23 

4.00 

4.02 

3.34 

Train,  Develop 

3.83 

3.51 

4.03 

4.01 

4.14 

3.20 

Support 

3.69 

3.64 

4.26 

4.07 

4.38 

3.16 

Discipline,  Punish 

3.64 

3.56 

4.14 

4.08 

4.04 

2.47 

Act  as  Model 

4.65 

4.41 

4.48 

4.31 

4.34 

3.87 

Cluster 

Mean  of  Ccmnm  Clusters 

4.01 

3.81 

3.92 

3.87 

3.82 

3.56 

Mean  of  Tedmical  Clusters 

4.12 

3.80 

4.19 

3.54 

3.77 

3.03 

Mean  of  Si^rvisory  Clusters 

3.86 

3.70 

4.10 

4.08 

4.13 

3.31 

505 


References 


Ash,  R.  A.  (1982).  elements  for  task  clusters;  Arguments  for  using 
Bulti-methodological  approaches  aix3  a  deamstration  of  their  utility. 
Public  Personnel  Management  Journal,  11,  60-90. 

Bass,  B.  K.  (1985).  Leadership  and  Performance  beyond  expectations.  New 
York :  Free  Press . 

Canpbell,  C.  H. ,  Caapbell,  R.  C.,  Rumsey,  n.  G.,  4  Eduards,  D.  C.  (1965). 
Development  and  field  test  of  task-based  MOS-specific  criterion 
measures.  (ARI  Technical  Report  717).  Alexandria,  VA;  U.S.  Army 
Research  institute  for  the  Behavioral  and  Social  Sciences. 

Headquarters  Department  of  the  Army  (1983).  Military  leadership,  FM  22-100. 
Washington,  DC:  Author. 

Hough,  L.  H. ,  Cast,  I.  F.,  White,  L.  A. ,  4  HcCloy,  R.  (1986,  August).  The 
relation  of  leadership  and  individual  differences  to  job  performance. 
Paper  presented  at  the  Annual  Meeting  of  the  American  Psychological 
Association,  Washington,  D.C. 

Pulakos,  E.  D.,  Hanson,  M.  A.,  Borman,  W.  C.,  Hallam  G.,  Carter  G.  4 
(Xwns-Kurtz,  C.  (1967,  August).  Developing  Behavioral  rating 
scales  to  evaluate  second  tour  performance  in  the  Army. 

Rumsey,  M.  G.  (1987,  August).  Getting  answers  to  the  right  questions:  Job 
Analysis  Strategy.  Paper  for  presentation  at  the  Annual  Meeting  of  the 
American  Psychological  Association,  New  York. 


506 


Steiiterg,  A.  G.,  van  Rijn,  P.,  ft  Bunter,  F.  T.  (1986,  Novanber).  Leader 


requirements  task  analysis.  Paper  presented  at  the  28th  Annual  Military 
Testing  Association  Conference,  Mystic,  CT. 

Steinberg,  A.  G.  (1987,  May).  Using  task  analysis  to  identify  Army  leader 
•job  requirements.  Paper  presented  at  the  Sixth  International  Air 
Force  Occupational  Analyst  Workshop,  San  Antonio,  IX. 

White,  L.  A.,  Gast,  I.  F.,  ft  Rumsey,  M.  G.  (1985,  November).  Leader  behavior 
and  the  performance  of  first  term  soldiers.  Paper  presented  at  the  27th 
Annual  Meeting  of  the  Military  Testing  Association,  San  Diego,  CA. 

White,  L.  M. ,  Gast,  1.  F.,  Rumsey,  M.  G. ,  ft  Sperling,  H.  M.  (1986). 

Categories  of  leaders*  behavior  that  influence  the  performance  of 
enlisted  soldiers.  (ARI  Working  Paper  RS-WP-86-1).  Alexandria, 

VA:  U.S.  Any  Research  institute  for  the  Behavioral  and  Social 
Sciences. 

yukl,  G.  A.  (1981).  Leadership  in  organisations.  Bnglewood  Cliffs,  NJ: 
Prentice-Hall . 

yukl,  G.  A.  (1964).  Revised  leader  behavior  task  list  (unpublished). 


507 


OVERCOMING  OBJECTIONS  TO  THE  USE  OF 
TEMPERAMENT  VARIABLES  IN  SELECTION 


Leaetta  M.  Hough 

Personnel  Decisions  Research  Institute 


Presented  on  Symposium, 

"New  Perspectives  on  Personality  and  Job  Performance" 

At  the  Annual  Convention  of  the 
American  Psychological  Association 
New  York 


August  1987 


The  views  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


509 


Overcoming  Objections  to  the  Use  of 
Temperament  Variables  in  Selection 


In  1982.  I  was  assigned  responsibility  for  developing  temperament, 
biodata,  and  Interest  measures  for  Project  A,  a  major  research  project 
funded  by  the  Army  Research  Institute  to  improve  prediction  of  job 
performance  of  Army  enlisted  personnel. 

When  we  started,  much  of  the  scientific  community  believed  it  would 
be  a  waste  of  time  to  include  temperament  variables  in  a  selection 
battery.  There  were  at  least  five  sources  of  negative  opinion.  First, 
in  1966  Guion  and  Gottier  published  an  article  in  Personnel  Psychology 
that  affected  the  scientific  community’s  attitude  and  knowledge  about 
the  usefulness  of  temperament  variables  for  predicting  job  performance 
criteria.  They  reviewed  the  cr1ter1on*related  validities  of  temperament 
variables  and  concluded  that,  though  temperament  variables  have 
criterion-related  validity  more  often  than  can  be  expected  by  chance,  no 
generalized  principles  could  be  discerned  from  the  results. 

A  second  source  of  negative  opinion  about  temperament  variables 
came  in  the  form  of  a  theoretical  challenge.  In  1968,  Walter  Mischel 
published  his  highly  influential  book  that  caused  an  intense  examination 
of  and  debate  over  trait  conceptions.  Mischel  asserted  that  the  appar¬ 
ent  evidence  of  cross-situational  consistency  of  behavior  was  a  function 
of  the  use  of  self  report  as  the  measurement  approach,  that  traits  were 
an  Illusion.  He  proposed  "situationism,"  stating  that  behavior  is 
explained  more  by  differences  in  situations  than  differences  in  people. 

Thus,  in  1982  much  of  the  scientific  community  was  persuaded  by  the 
published  literature  and  believed  that  temperament  measures  had  little 
theoretical  merit  and  were  of  little  practical  use.  Even  those  who 


510 


thought  temperament  measures  might  have  some  merit  were  concerned  that 
temperament  scales  might  be  inappropriate  and  unfair  to  people  who  were 
protected  under  the  1964  Civil  Rights  Act.  In  addition,  many  people 
worried  about  intentional  distortion  of  self  descriptions  In  an  appli¬ 
cant  setting. 

Equally  Important  and  negative  was  the  lay  community's  perception 
of  temperament  Inventories.  People  objected  to  offensive  Items  and 
resented  being  asked  to  respond  to  such  Items.  Researchers  had  been 
sensitized  by  the  lay  community's  negative  reaction  to  temperament 
inventories  and  were  legitimately  leery  of  antagonizing  the  public. 

This  was  the  environment  In  1982. 

Now,  In  1987,  Army  generals  are  asking  us  to  Implement  the  tempera¬ 
ment  inventory  we  developed.  What  did  we  do  to  bring  this  about? 

RESEARCH  STRATEGY 

A  lot  of  time  and  effort  was  required.  We  also  had  a  research 
strategy.  That  strategy  is  outlined  on  page  two  of  your  handout.*  I’d 
like  to  describe  that  approach  and  some  of  our  findings.  The  research 
strategy  was  construct  oriented  and  Included  four  basic  steps:  (1)  a 
literature  review  to  identify  predictor  constructs  that  were  likely  to 
predict  job  performance  criteria  Important  to  the  Army,  (2)  the  develop¬ 
ment  of  a  temperament  inventory  that  consisted  of  nonsensitive  Items  and 
scales  designed  to  detect  intentional  distortion  of  self  descriptions, 
(3)  a  criterion-related  validity  study  to  identify  temperament  scales 
that  were  job-related,  and  (4)  an  examination  of  the  effects  of  motiva¬ 
tional  sets  on  scale  scores  and  criterion-related  validities. 

*  Reproduced  at  the  end  of  this  paper. 


511 


Literature  Review 


Predictor  and  criterion  taxonomies.  Since  our  approach  was 
construct  oriented  for  both  predictors  and  criteria,  we  needed  a  taxon¬ 
omy  for  both  predictors  and  criteria.  The  criterion  categories  were 
education,  training,  job  involvement,  job  proficiency,  and  adjustment. 
For  the  predictors,  we  started  with  the  structure  initially  found  by 
Tupes  and  Christal  (1961)  in  the  early  60s.  Following  Hogan’s  thinking 
in  the  early  80s,  we  split  one  of  the  constructs  into  two.  Thus,  our 
predictor  taxonomy  consisted  of  six  constructs:  Surgency,  Affiliation, 
Adjustment,  Agreeableness,  Dependability,  and  Intel lectance. 

Categorization  of  temperament  scales.  Once  we  had  a  predictor 
taxonomy,  our  next  step  was  to  categorize  existing  temperament  scales 
Into  the  classification  scheme.  From  articles  and  manuals,  we  obtained 
hundreds  of  correlations  between  temperament  scales.  He  categorized  the 
temperament  scales  into  the  six  categories  and  a  miscellaneous  cate¬ 
gory,  and  then  refined  the  classifications  through  an  iterative  process 
of  classifying  and  reclassifying  temperament  scales  to  maximize  the  mean 
within-category  correlations  and  minimize  the  mean  between-category 
correlations.  The  results  of  this  process  are  shown  in  Table  1  of  your 
handout.  The  circles  in  the  diagonal  show  the  mean  within-category 
correlations  which  are  in  the  .30s  and  .40s  and  are,  in  all  cases, 
higher  than  the  mean  between-category  correlations. 

Meta  analysis  of  criterion -related  validities.  Our  next  step  was 
to  summarize  the  criterion-related  validities  according  to  these  con¬ 
structs;  Table  2  of  your  handout  shows  the  results.  It  is  a  meta  analy¬ 
sis  of  the  cri ter ion -related  validities  of  scales  within  each  predictor 
construct  for  each  criterion  construct.  As  you  can  see,  several  temper¬ 
ament  constructs  correlate  with  the  criteria.  Note  that  there  are  three 


512 


additional  predictor  constructs.  These  three,  "Achievement,"  "Masculin¬ 
ity,"  and  "Locus  of  Control,"  were  all  a  part  of  the  miscellaneous 
category.  When  we  summarized  the  validities  for  the  miscellaneous 
category,  we  found  respectable  validities  there  too,  so  we  looked  more 
closely  at  the  scales  included  in  the  miscellaneous  category  and  found 
these  additional  three  constructs. 

The  results  in  this  table  are  different  from  the  results  that  Guion 
and  Gottier  obtained.  We  believe  that  our  strategy  of  summarizing  the 
validities  according  to  both  predictor  and  criterion  constructs  accounts 
for  the  difference  in  results.  To  test  this  hypothesis,  we  summarized 
the  validity  coefficients  in  our  database  without  regard  to  construct 
and  obtained  a  coefficient  of  essentially  zero,  quite  different  from  the 
coefficients  in  Table  2.  We  believe  this  demonstrates  the  importance  of 
constructs  as  organizing  principles  for  examining  and  understanding  the 
literature  on  the  criterion-related  validity  of  temperament  variables. 

We  used  the  results  in  this  table  to  guide  us  in  selecting  predictor 
constructs  to  measure. 

Development  of  Temperament  Scales 

The  next  step  in  our  research  strategy  was  to  develop  measures  of 
the  constructs  that  the  literature  review  indicated  were  likely  to 
predict  criteria  important  to  the  Army.  List  1  of  your  handout  shows 
the  substantive  scales  we  developed  for  each  construct.  We  developed 
measures  for  six  constructs:  Surgency,  Adjustment,  Agreeableness, 
Dependability,  Achievement,  and  Locus  of  Control.  We  also  developed  a 
"Physical  Condition"  scale  and  four  response  validity  scales:  Non- 
Random  Response,  Social  Desirability,  Poor  Impression,  and  Self- 
Knowledge.  We  developed  the  Non-Random  Response  scale  to  detect  inven- 


513 


tories  that  had  been  completed  carelessly,  a  "Social  Desirability"  scale 
to  detect  intentional  distortion  that  might  occur  in  an  applicant  set¬ 
ting  or  a  non-draft  setting,  and  a  "Poor  Impression"  scale  to  detect 
intentional  distortion  that  might  occur  in  a  draft  setting.  We  called 
the  Inventory  the  ABLE,  short  for  Assessment  of  Background  and  Life 
Experiences. 

We  revised  the  items  and  scales  in  the  ABLE  many  times.  People 
representing  a  variety  of  perspectives  reviewed  the  items  for  sensitive 
content.  We  also  pretested  the  scales  three  times,  each  time  evaluating 
and  revising  the  items  and  scales  based  on  soldiers*  verbal  feedback, 
item  response  distributions,  internal  consistency  estimates,  and  test- 
retest  reliabilities.  The  scale  statistics  for  the  ABLE  scales  appear 
in  Table  3  of  your  handout.  The  average  number  of  items  in  a  scale  is 

IS.  The  median  alpha  of  the  substantive  scales  is  .81,  and  the  median 

test-retest  reliability  of  the  substantive  scales  is  .78.  Table  4  sum¬ 
marizes  the  ABLE  substantive  scale  statistics  as  well  as  correlations  of 
the  ABLE  substantive  scales  with  each  other  and  with  other  components  of 
the  four-hour  predictor  battery.  The  only  part  of  the  predictor  battery 

that  the  ABLE  substantive  scales  correlate  with  in  any  sizable  way  are 

other  ABLE  substantive  scales.  The  ABLE  substantive  scales  appear  to  be 
tapping  a  part  of  the  predictor  domain  not  tapped  by  other  measures. 

Demonstration  of  Job-Relatedness 

The  next  step  in  our  research  strategy  was  to  demonstrate  the  Job¬ 
relatedness  of  our  temperament  scales.  We  conducted  a  concurrent  valid¬ 
ity  study  during  the  summer  and  fall  of  1985.  Over  9000  soldiers  com¬ 
pleted  the  4-hour  predictor  battery  that  included  measures  of  cognitive 


514 


ability,  spatial  ability,  perceptual  psychomotor  ability,  work  environ¬ 
ment  preferences,  interests,  and  temperament. 

Criterion-related  validities.  The  criterion  measures,  the  develop¬ 
ment  of  which  was  a  major  part  of  the  research  project,  were  developed 
by  a  different  part  of  the  research  team.  The  criterion  composites  are 
very  briefly  described  in  List  2  of  your  handout.  There  are  five 
composites:  Core  Technical  Proficiency,  General  Soldiering  Proficiency, 
Effort  and  Leadership,  Personal  Discipline,  and  Physical  Fitness  and 
Military  Bearing.  The  first  two  consist  mainly  of  work  samples  and 
knowledge  tests.  The  other  three  consist  of  supervisory  and  peer  rat¬ 
ings  and  information  obtained  from  personnel  records. 

Table  6  of  your  handout  shows  the  criterion-related  validities  of 
the  ABLE  scales  for  these  five  criteria.  The  results  suggest  that 
Achievement  scales  are  the  best  predictors  of  the  "Effort  and  Leader¬ 
ship"  criterion;  Dependability  scales  are  the  best  predictors  of  the 
"Personal  Discipline"  criterion;  and  Physical  Condition  is  the  best 
predictor  of  the  "Physical  Fitness  and  Military  Bearing"  criterion, 
though  the  Achievement  scales  also  correlate  with  this  criterion.  These 
three  criteria  include  the  supervisory  and  peer  ratings.  The  other  two 
criteria  Core  Technical  Proficiency  and  General  Soldiering  Proficiency, 
which  consist  of  work  sample  and  know'edge  tests,  are  not  predicted  with 
the  ABLE  substantive  scales. 

Table  7  in  your  handout  shows  the  criterion-related  validities  of 
the  different  types  of  predictors  included  in  the  study.  It  shows  the 
multiple  correlations  of  each  type  of  predictor  with  each  of  the  five 
criteria.  As  you  can  see,  the  best  predictors  of  the  supervisory  and 
peer  rating  criteria,  that  is.  Effort  and  Leadership,  Personal  Disci¬ 
pline,  and  Physical  Fitness  and  Military  Bearing,  are  the  ABLE  substan- 


515 


tive  scales.  The  other  conclusion  from  this  table  is  that  the  ASVAB 
mental  ability  test  and  the  ABLE  temperament  inventory  are  the  two  best 
predictors  of  the  criterion  domain. 

Fairness 

We  next  turned  to  the  issue  of  fairness.  Are  the  items  and  scales 
fair  for  groups  protected  under  the  1964  Civil  Rights  Act?  The  mean 
scores  for  whites,  blacks,  and  Hispanics  appear  in  Table  8  of  your 
handout.  As  you  can  see,  minorities  do  not  tend  to  score  lower  than 
whites  on  the  ABLE  scales.  Our  efforts  to  write  items  that  were  not 
biased  against  minorities  appear  to  have  been  successful.  WeVe 
currently  conducting  differential  validity  and  fairness  analyses;  those 
analyses,  however  are  not  yet  complete. 

Examination  of  Effects  of  Motivational  Set 

The  fourth  component  of  our  research  strategy  involved  investigat¬ 
ing  several  issues  related  to  motivatidhal  set.  A  frequent  criticism  of 
self-report  Inventories  is  that  respondents  can  intentionally  distort 
their  responses.  When  respondents  are  applicants,  this  is  an  especially 
Important  criticism  because  the  criterion- related  validities  might  be 
negatively  affected  by  distorted  responses.  We  therefore  studied  the 
impact  of  motivational  set  on  criterion- related  validities,  the  extent 
to  which  applicants  distort  their  self  descriptions,  and  the  usefulness 
of  the  four  response  validity  scales  to  detect  and  adjust  for  motiva¬ 
tional  set. 

Faking  study.  First,  we  conducted  an  experiment  in  which  soldiers 
were  instructed  to  respond  honestly  or  to  distort  their  responses  in  a 
specified  way.  The  participants  in  the  experiment  were  245  enlisted 


516 


soldiers  at  Ft.  Bragg.  The  design  was  a  repeated  measures  with  faking 
and  honest  conditions  counter-balanced.  Ue  performed  a  multivariate 
analysis  of  variance  on  the  ABLE  scales  and  found  that  soldiers  £in 
distort  their  responses  when  instructed  to  do  so. 

We  then  examined  the  extent  to  which  the  response  validity  scales 
detected  intentional  distortion.  Table  9  of  your  handout  shows  the 
results.  The  last  two  columns  show  the  effect  size  of  the  difference 
between  honest  and  fake  good  and  honest  and  fake  bad.  Effect  size  can 
be  interpreted  in  standard  deviation  terms.  Thus,  the  difference  in  the 
honest  and  fake  good  condition  for  Social  Desirability  is  essentially 
one  standard  deviation;  the  Social  Desirability  scale  detects  distortion 
in  the  fake  good  condition.  As  you  can  see,  the  Non-Random  Response, 
Poor  Impression,  and  Self-Knowledge  scales  detect  distortion  in  the  fake 
bad  condition. 

We  next  examined  the  extent  to  which  we  could  use  the  response 
validity  scales  Social  Desirability  and  Poor  Impression  to  adjust  ABLE 
substantive  scales  for  faking.  Table  10  shows  the  effect  of  regressing 
out  Social  Desirability  in  the  fake  good  condition  and  the  effect  of 
regressing  out  Poor  Impression  in  the  fake  bad  condition.  Median  values 
are  reported  in  this  table.  The  .49  in  the  upper  left-hand  cell  indi¬ 
cates  that  the  median  difference  in  ABLE  scores  between  the  honest  and 
fake  good  condition  before  regressing  out  Social  Desirability  is  .49  or 
half  a  standard  deviation.  That  is,  ABLE  scale  scores  differ  by  about 
half  a  standard  deviation  in  the  fake  good  condition  as  compared  to  the 
honest  condition.  The  next  number  to  the  right  shows  that  after  regres¬ 
sing  out  Social  Desirability  from  the  fake  good  condition,  the  ABLE 
substantive  scales  differ  from  the  honest  condition  by  only  .14  or  just 
over  one-tenth  of  a  standard  deviation. 


517 


The  next  two  values  to  the  right  show  the  results  for  the  honest 
and  fake  bad  conditions.  Clearly,  the  Social  Desirability  and  Poor 
Impression  scales  can  be  used  to  adjust  substantive  scale  scores  for 
intentional  distortion. 

These  data  demonstrate  that:  (1)  people  can  distort  their  res¬ 
ponses  to  temperament  scales,  (2)  response  validity  scales  can  detect 
such  distortion,  and  (3)  the  response  validity  scales  can  be  used  to 
adjust  temperament  scale  scores  for  distortion. 

We  then  asked,  to  what  extent  dfi  applicants  distort  their  res¬ 
ponses?  To  answer  this  question,  we  compared  scale  scores  of  121  Army 
applicants  with  scale  scores  of  two  groups  of  soldiers  who  had  no  motive 
for  distorting  their  responses.  Table  11  shows  the  results.  On  the 
substantive  scales,  applicants  actually  scored  lower  than  one  or  both 
groups  of  soldiers  9  out  of  11  times.  These  data  suggest  that  appli¬ 
cants  do  not  appear  to  distort  their  responses. 

Nevertheless,  we  examined  the  effects  of  inaccurate  self  descrip¬ 
tions,  as  detected  by  the  response  validity  s'^ales,  on  criterion-related 
validities  obtained  in  the  concurrent  validity  study.  Table  12  shows 
that  validities  for  the  group  detected  as  responding  in  a  random  way  are 
significantly  lower  than  validities  for  the  group  responding  conscien¬ 
tiously.  Table  13  shows  the  increment  in  validity  when  Social  Desira¬ 
bility  is  used  as  a  moderator  variable.  Table  14  shows  the  increment  in 
validity  when  Poor  Impression  is  used  with  each  substantive  scale  in  a 
multiple  correlation.  The  data  in  these  three  tables  indicate  that  the 
response  validity  scales  do  improve,  modestly,  the  validities  of  the 
substantive  scales  even  in  a  concurrent  validity  study  where  there  is 
little  motive  to  distort  one’s  self  description. 


518 


Project  A  researchers  are  currently  conducting  a  predictive  valid¬ 
ity  study  which  will  provide  an  opportunity  to  evaluate  the  validities 
of  the  ABLE  substantive  scales  and  the  usefulness  of  the  response  valid¬ 
ity  scales  In  a  selection  situation. 

Summary 

We  overcame  objections  to  the  use  of  temperament  variables  in 
selection  by: 

1.  reviewing  the  literature  using  a  construct-based  approach  to 
identify  useful  temperament  constructs  in  previous  criterion- 
related  validity  studies: 

2.  focusing  scale  development  on  constructs  that  are  likely  to  predict 
criteria  important  to  the  clienL; 

3.  developing  scales  that  consist  of  items  acceptable  to  the  public; 

4.  developing  scales  that  are  not  biased  against  mirrorities; 

5.  developing  scales  that  are  psychometrically  good; 

6.  developing  response  validity  scales  to  detect  inaccurate  self  des¬ 
criptions; 

7.  evaluating  job-relatedness  of  scales  by  demonstrating  criterion- 
related  validity; 

8.  developing  and  evaluating  "adjustments"  to  substantive  scale  scores 
based  on  response  validity  scale  scores,  and; 

9.  evaluating  the  effect  of  motivational  set  on  scale  scores  and 
criterion-related  validities. 


519 


REFERENCES 


Gu1on«  R.  M.,  &  Gottler,  R.  F.  (1966).  Validity  of  personality  mea¬ 
sures  in  personnel  selection.  Personnel  Psychology.  IS,  135-164. 
Mischel,  W.  (1968).  Personality  and  assessment.  New  York:  Wiley. 


520 


Tabla  1  Mean  Within-Category  and  Between  Category 
Correlations  of  Teaperanent  Scales 


SD^-.18 

AdjtMCMIlC 

llaan^>.20 
10^-. 1# 
M^-321 

****** 

80^-. 19 

Maan^«.04 

S0^-.17 

H^-173 

Maan^».26 
S0^-.16 
•  M^-162 

*‘**"e“® 

S0^-.14 

M  -  AA 

• 

Dapandablllcr 

Maan  o-.OS 
r 

SD^-  .16 
286 

tlaan^o.13 

SO ^-.20 
0^-276 

Maan  *.06 
r 

SO  -.17 
r 

11^-166 

Hufl 

mmt 

XacailaecMC* 

OS 

iH^9 

Maan^>.02 

SD^-.1« 

H^-193 

Kaan^-.OA 
S0^-.16 
M^-  96 

MB 

Moan^^^^ 
S0^-.19 
:i^-  32 

Atlillacion 

Haan^«.09 

SD^-.21 

M^-157 

Maan^a.OO 
50^-. 16 
H^-ISO 

ttaan^-.lO 

S0^-.l7 

98 

Maan  -.08' 
r 

50^-. 16 

N  -160 
r 

Maan  —.16 
r 

SO^-  .15 

M  -  86 

r 

Maan^-^^ 

SD^-.16 

V 

KlsealXanaoHa 

Maan^>.09 

S0j-.17 

M^-392 

Maan^«.12 

SO ^-.18 
H^-A19 

HSB 

Haan^-.02 

SO  -.18 
r 

M^-36l 

Maan  -.06 
r 

S0^-.l7 

N  -262 
r 

Maan  ^—.06 
SOj-  .13 
M^-  208 

Maan^-^0^ 

S0^-.20 

M^-266 

Surganey 

Adjusteanc 

A^rccabla** 

nmmm 

Ocpciid.i- 

blUcx 

IncellttC- 

tance 

Afltlla- 

Clon 

Mtaeal- 

lanaoua 

521 


T^le  2  Meta  Anedysis  of  Qriterion-Related  Validity  Studies  ^ 
That  Used  Tenperanent  Predictors 


Tint  Period  1960-1984. 

*  A  star  denotes  the  constnjct  is  one  o#  the  "Big  five*  constructs. 

Note:  Correlations  are  not  corrected  for  unreliability  or  ran9e  restrictions. 


522 


Table  3  ABLE  Scale  Statistics  for  Total  Grcup^ 

(Concuirrent  Sample;  Revised  Trial  Battery) 


523 


Total  grtxf)  after  screenijig  for  missing  data  and  random  responding. 

M  »  408  -  412  for  test-retest  correlations  (N  **  414  for  Non-Random  Response  test-retest  correlations) 
Screened  only  for  missing  data. 


List  1 


ABLE^  Scales  Organized  According  to  Construct  Intended  to  Measure 


SUBSTANTIVE  SCALES: 


Surqencv 

.  Dominance 
.  Energy  Level 

Adjustment 

.  Emotional  Stability 

Aoreeableness  fUkeabilitvl 
.  Cooperativeness 

Deoendabilitv 

.  Nondelinquency 
.  Traditional  Values 
.  Conscientiousness 

Achievement 

.  Work  Orientation 
.  Self  Esteem 

Locus  of  Control 

.  Internal  Control 

Physical  Condition 

.  Physical  Condition 


RESPONSE  VALIDITY  SCALES: 

.  Non-Random  Response 
.  Social  Desirability 
.  Poor  Impression 
.  Self-Knowledge 


^  Inventory  developed  by  PDRI  for  the  Army  Research  Institute  entitled 
"Assessment  of  Background  and  Life  Experience." 


524 


Table  4  ABLE  Substantive  Scales:  Sunnary 
(Revised  Trial  Battery) 


n 

n 

u> 

o 

r* 

• 

• 

• 

• 

(M 

• 

o 

• 

1 

1 

1 

1 

1 

1 

o 

o 

o 

o 

o 

rH 

o 

o 

o 

o 

o 

o 

• 

• 

• 

• 

• 

• 

0 

•• 

o 

0 

rH 

« 

0 

u 

Ji 

0) 

0 

■H 

o 

m 

P 

> 

JS 

0 

•H 

a 

> 

p 

c 

< 

P 

0 

0 

P 

p 

0 

u 

£ 

u 

•H 

9 

c 

•o 

CQ 

e 

0 

•p 

p 

M 

cu 

•3 

-H 

03 

m 

o 

< 

G 

p 

o 

•P 

c 

o 

n 

a 

o 

« 

•H 

•H 

t-t 

p 

£ 

p 

0 

« 

0 

0 

c 

PS 

C 

p 

1 

o 

0 

« 

p 

•H 

p 

p 

M 

P 

p 

c 

« 

0 

o 

H 

iH 

o 

e 

PS 


m 


0 

0 

0 

u 

p 

w 

9 

0 

p 

0 

c 

0 

0 

X 

e 

c 

p 

o 

o 

p 

p 

•H 

o 

> 

8 

0 

c 

o 

0 

H 

JS 

p 

0 

o 

9 

0 

>< 

0 

P 

0 

0 

0 

O 

a 

0 

o 

s 

ca 

•o 

0 

0 

p 

0 

9 

> 

o 

p 

P 

0 

p 

a 

p 

p 

0 

0 

f* 

0 

o 

e 

p 

0 

p 

o* 

c 

P 

0 

o 

(M 

M 

a 

a 

u 

a 

c 

c 

c 

c 

• 

o 

o 

0 

o 

•H 

•H 

-H 

•H 

•o 

p 

P 

P 

p 

< 

0 

0 

0 

0 

0 

0 

0 

0 

a 

P 

P 

P 

p 

P 

P 

P 

p 

> 

0 

0 

0 

o 

Ul 

o 

u 

u 

u 

< 

525 


T^le  5 

ABLE  Substantive  Scales:  Factor  Analysis^ 
(Concurrent  Sanple;  Revised  Trial  Battery) 


526 


wjntjoniint  viAlysie,  vetijnaK  roUtioH. 

IIOtBI  N  «>  6367 


List  2 


Criterion  Composites^ 


Core  Technical  Proficiency  -  e)  hands-on  tests  of  MOS-specIfIc  technical 
knowledge  and  skills:  and  b)  tests  of  school  and  Job  knowledge. 


General  Soldierino  Proficiency  -  a)  hands-on  tests  of  general  soldiering 
skill;  and  b)  general  soldiering  knowledge  and  skill  test  items. 


Effort  &  Leadership  -  a)  supervisory  and  peer  ratings  of  effort  and 
leadership,  overall  effectiveness,  HOS  effectiveness  and  predicted  combat 
effectiveness:  and  b)  letters  and  certificates  of  commendation  and  other 
achievements. 


Personal  Discipline  -  a)  supervisory  and  peer  ratings  of  personal  control 
and  discipline:  and  b)  disciplinary  actions  and  other  negative  indicators 
In  personnel  files. 


Physical  Fitness  &  Military  Bearing  -  a)  supervisory  and  peer  ratings  of 
physical  fitness  and  military  bearing;  and  b)  physical  readiness  tests. 


^Oata  gathered  at  same  time  as  Trial  Battery  was  administered,  i.e.,  summer 
and  fall  of  1985. 


Tabl«  6  Validities  of  ABLE  Scales  for  Job  Performance  Criteria: 

Zero'Order  Correlations 

(Revised  Trial  Battery;  Concurrent  Validity  Study) 


»r>diciof 


Cor*  tcdmic*! 
ffoHcltnc<_ 


Cf itgfion 


e«n*r»| 

(•Idicriny  Clfort  t 

ProiicierKV  L«>der»hip 


rfcr*>c*i 
r  I  tries*  t 
fersenal  Nilitsry 
Oisciptine  l»*rl«ig 


MB^tneys 

.  OmuiMnea  .01 

jyaudvnanc: 

.  Smlt  ERMS  .02 

.  Hack  oriMCatlen  .02 

.  Bmkv  1«««1  .02 


.01 

.15 

.02 

.!• 

.01 

.20 

.12 

.20 

.02 

.22 

.10 

.21 

.82 

.22 

.14 

.25 

Adjustwni: 

.  (sntionsl  Stability  .02 

Afrceobleness  (likcobility) 

.  CoeperativenMS  .01 

Ospptndobi  I  i  ty: 

.  Traditional  Values  .03 

.  Ndn-del  inquerKy  .09 

Conscientiousness  .02 

Others: 

.  Internal  Control  .04 

.  rhysical  Condition  *.0^, 

Response  Validity  Scales: 

Hon-Randosi  Response^  .13 

.  Social  Desirability  *.02 

.  foor  liqiretsion  -  >.IK 

.  Self  KnowledRC  *.04 


.02 

.12 

.12 

.16 

.02 

.19  1 

EEl 

.14 

.06 

.13 

.25 

.16 

.02 

.12 

.29 

.14 

.02 

.10 

.23 

.22 

.09 

.13 

.13 

.13 

*.09 

.09 

-.03 

s 

.14 

.02 

.10 

.02 

-.06 

.02 

.09 

.02 

-.09 

-.19 

-.15 

-.16 

-.01 

.02 

.09 

.13 

^Correlations  are  based  on  uiscreened  data  for  this  scale.  N  varies  Iren  0424  to  9122  for  this 
scale. 

Note:  N  varies  Iron  76C6  to  0427, 

Note:  A  boa  indicates  notable  predictorycriterien  construct  relationships. 


528 


Tahlm  7 


Kultlpl*  Correlations^  of  Six  Independent 
Predictor  Cosposites  with  each  of  Five  Job 
Perfonance  Criteria 

(Concuzrent  Validity  Study) 


Twpruwnt  (and 

physical  activities  scaia)  .2A 


^Hul tipis  at  are  adjustsd  for  thrinaass  and  corrsetsd  for  restriction  in  range,  but  not  corrected  for  criterion 
•airei  iability. 

^Mental  ability  test  currently  used  by  Military. 

Note:  fntries  In  table  are  averaged  across  9  Anay  Military  occupational  specialties  (HOS)  with  coMpIcte  criterion  data, 
total  saMple  is  5902.  Sanple  sizes  range  froM  281  to  570;  Median  ■  *32. 

Note:  toaes  denote  the  tuo  best  predictors  of  the  criterion  space. 


529 


ABLE  Scale  Means  and  standard  Deviations  Separately  for  Fact  (Trial, Battery^ 
(B£viMd) 


^  8  s  8  8 8  s(  s  s  s 


8^88 
•  *  •  • 

M  O  ^ 


p  ^ 

^  m  ^ 
•  *  •  • 


SC'*" 


I 


88888888888 


8888 


530 


ABLE  Response  Validity  Scales t 


m 

M 

Ol 

Ol 

• 

« 

• 

• 

m 

r« 

CB 

0 

0 

• 

• 

• 

• 

r* 

M 

M 

«0 

40 

0 

0 

M 

m 

m 

0 

X 

n  on 

u*  o* 

■o  8 

C  0) 

« 


4  1 

3  " 


aa  e 

(M 


ac  M 


o 

rt 

031 


sa  M 

•-I 


a  o 


>» 

4J  — i 
•H  m 
•-4  « 
•H  3 
^  -U 
«  U 

>4  -H 

•*4  > 
m 

o  ^ 
« 
.H 

m-H 


TASLE  10 


Cflaeti  of  IcgrMsing  Out  iMponc*  Validity  Scataa 
(Social  Oasirability  and  Poor  lapression) 
in  Faking  Conditions  for  AILE 


Honest  vs.  Fake  Good 
Effect  Site _ 


AIlE  Substantive  Scales 


■efore  Adiuitwent  After  sdiustwent 
.49  .U 


Honest  vs.  Fake  Bad 
Effect  Site 


Before  AdiusMent  After  Adiustnent 
2.10  .45 


532 


Table  11 


Coaparfson  of  Ft.  Iran  Nonast 


§ 


re.  KfMX.  and  NCFS  (Applicants)  AILE  Seales 


Ft.  Iragg 
(Honest)* 

1  BUS 

■asponse  Validity  Seales 

Social  '‘esirability 

116 

15.91 

Salf'Knowledea 

116 

29.54 

Mon'Sandas  Saspensa 

116 

7.58 

Poor  Impression 

116 

1.50 

Sttetantive  SealM 

Enotional  Stability 

112 

66.22 

Self'Esteca 

112 

34.77 

Cooperacivcness 

112 

53.33 

Conscientiousness 

112 

46.37 

llon*Oel  inqueney 

112 

n.24 

Traditional  Values 

112 

36.67 

Uork  Orientation 

112 

59.71 

Internal  Control 

112 

49.48 

Energy  Level 

112 

57.56 

Ooni  nance 

112 

35.54 

Physical  Condition 

112 

32.96 

MEPS 

(Applieants)  Ft.  Knox  Total 


M 

Mean 

1 

Heen 

121 

16.63 

276 

16.60 

3.21 

121 

28.03 

276 

29.64 

3.63 

121 

7.79 

276 

7.75 

.64 

121 

1.05 

276 

1.54 

1.84 

118 

66.03 

272 

65.05 

7.86 

1 

118 

34.04 

272 

35.12' 

5.00 

118 

54.60 

272 

54.19 

6.05 

118 

46.49 

272 

48.97 

5.86 

118 

54.36 

272 

55.49 

6.91 

118 

36.97 

272 

37.28 

4.50 

118 

58.37 

272 

61.40 

7.73 

118 

51.90 

272 

50.37 

6.13 

118 

56.67 

272 

57.19 

6.95 

118 

32.84 

272 

35.41 

6.05 

118 

28.27 

272 

31.08 

7.49 

♦Scores  are  based  on  persons  v/ho  responded  to  the  honest 
condition  first. 


533 


534 


We  performed  a  split  group  analysis  rather  than  a  moderated  regression  because  the 
variable  of  interest  had  a  highly  skewed  distribution. 


Table  13.  Hoderatin^ Effects  of  "Social  Desizabllity"  Scale  on  CSorxelatlons 
Between  ABL£  Scales  and  Job  Perfoxmanoe  Criteria 


• 


1  3 
S  .  i! 

•DC* 

*5  s 
S 

III 


a  ^ 


U  VI 

■&  O* 

2  ^ 


n  ^ 

§•  ^ 

t 

i  I 

~  • 

3  * 

S  I 


n  I 


I 

I  S  X 


535 


SOURCES  OF  NEGATIVE  OPINION 


•  Guion  Sc  Gottier  literature  review — conclude 
temperament  measures  are  of  little  practical 
use 

•  Theoretical  challenge--situationism  (Mischel) 

•  Inappropriate  and  unfair  for  persons  protected 
by  1964  Civil  Rights  Act 

•  Intentional  distortion  of  self  reports  in  applicant 
setting 

•  Offensive  item  content 


537 


RESEARCH  STRATEGY:  CONSTRUCT  ORIENTATION 


1.  Review  Literature 

•  Develop  predictor  taxonomy 

•  Classify  temperament  scales 

•  Develop  criterion  taxonomy 

•  Summarize  criterion«related  validities 

according  to  predictor  and  criterion 
constructs 

•  Identify  useful  predictor  constructs 


538 


2022740367-*7032749307 


5ENT  BY.'Xerox  Telecopier  7021  ;  2-22-83  ;n:07AM  ; 


RESEARCH  STRATEGY:  CONSTRUCT  ORIENTATION 


2.  Devtlop  Tsmparimtnt  Scalts 

•  Examint  itams  for  Mntitlvt  eonttnt 

•  Davalop  raaponaa  validity  acalea  to 

datact  Intantlonal  diatortlon 

a  Prataat 

a  Examlna  paychomatrlc  charactariatica 

a  Ravlaa 


539 


2022740367  PAGE . 003 


FEB  22  ’93  11:04 


RESEARCH  STRATEGY:  CONSTRUCT  ORIENTATION 


3.  Demonstrate  Job-Relatedness 

•  Conduct  concurrent  validity  study 

•  Compute  criterion-related  validities 

•  Conduct  differential  validity  analyses 

•  Conduct  fairness  analyses 

•  Conduct  predictive  validity  study 


540 


RESEARCH  STRATEGY:  CONSTRUCT  ORIENTATION 


4.  Examine  Effects  of  Motivational  Set 

•  Evaluate  fakability  of  scales 

•  Evaluate  response  validity  scales 

•  Evaluate  moderator  effects  of  response 

validity  scales 

•  Develop  ''adjustment"  formula 

•  Assess  effects  on  criterion-related 

validities 


541 


SUMMARY 


1.  Review  the  literature  using  a  construct-based  approach  to 
demonstrate  the  usefulness  of  temperament  variables  in 
previous  research. 

2.  Focus  scale  development  on  constructs  that  are  likely  to 
predict  criteria  important  to  the  client. 

3.  Develop  scales  consisting  of  items  acceptable  to  the  public. 

4.  Develop  scales  that  are  not  biased  against  minorities. 

5.  Develop  scales  that  are  psychometrically  good. 

6.  Develop  response  validity  scales  to  detect  inaccurate 
self  descriptions. 

7.  Evaluate  job-relatedness  of  scales  by  demonstrating 
criterion-related  validity. 

8.  Develop  and  evaluate  "adjustments'*  to  substantive  scale 
scores  based  on  response  validity  scale  scores. 

9.  Evaluate  effect  of  motivational  set  on  scale  scores  and 
criterion-related  validities. 


542 


OPTIMAL  JOB  ASSIGNMENT  AND  THE  UTILITY  OF  PERFORMANCE: 

SOME  KEY  ISSUES 


Roy  Nord 

Leonard  A.  White 
U.S.  Army  Research  Institute 


Presented  at  the  Annual  Convention  of  the 
American  Psychological  Association 
New  York 

August  1987 


The  views  expressed  In  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


543 


Key  Issues 


BMCaGRUC 

Decision-makers  in  the  military  services  are  frequently  faced  with  policy 
alternatives  that  will  produce  different  distributions  of  soldier  competence 
across  a  large  set  of  jobs.  These  alternatives  include  not  only  policies 
dealing  directly  vith  the  selection,  allocation  and  training  of  soldiers,  but 
also  a  range  of  actions  that  affect  the  share  of  scarce  resources  devoted  to 
manpower  as  opposed  to  other  "inputs"  to  the  process  of  producing  national 
defense.  Whenever  such  alternatives  are  compared,  a  judgement,  either 
inplicit  or  explicit,  must  be  made  as  to  the  "value"  (and  cost)  of  different 
distributions  of  soldier  performance. 

The  primary  concern  of  this  paper  is  with  the  measurement  and  use  of 
performance  utility  as  an  aid  to  personnel  classification  and  assignment 
decisions.  Historically,  most  of  the  research  on  performance  utility  in 
industrial  psychology  has  addressed  the  problem  of  translating  performance 
gains  from  various  selection  strategies  into  a  metric  that  can  be  used  to 
demonstrate  the  value  of  iiqproved  selection  procedures  to  skeptical  decision¬ 
makers.  The  most  common  metric  is  dollar  value  (see,  e.g.,  Brogden, 

Cascio,  1987;  Hunter  and  Schmidt,  1982),  althou^  metrics  other  than  doll^' 
value  have  also  been  used  (Eaton,  Wing,  and  Mitchell,  1985). 

The  appropriate  metric  for  con^risons  among  different  classification  ] 
assignment  procedures  may  be  quite  different  from  the  metric  needed  to  assess 


selection  strategies.  Sadacca  and  Caa^pbell  (1985)  noted  that,  in  the  context 
of  optimal  assignment,  a  dollar  metric  is  not  required  and,  in  some  cases, 
may  be  inappropriate.  The  reason  for  this  difference  is  strai^tforward:  Zn 
the  context  of  selection,  an  organization  must  consider  the  tradeoffs  between 
the  expenditure  of  scarce  resources  (often  dollars)  to  increase  the  quality 
of  its  manpower  versus  the  expenditure  of  those  resources  to  increase  the 
quantity  or  quality  of  some  other  factor  of  production.  In  the  case  of 
classification  and  assignment,  the  objective  is  to  maximize  the  efficiency 
vrith  %rfiich  a  given  pool  of  available  manpower  is  used.  In  either  case,  the 
organization  is  concerned  with  the  efficient  allocation  of  a  scarce  resource 
among  coopeting  activities,  but  in  the  case  of  selection,  manpower  is  an 
"activity"  and  in  the  case  of  assignment,  it  is  a  "resource". 

Performance  "Utility"  vs  Performance  "Value* 

Before  pursuing  the  discussion  further,  a  brief  semantic  digression  is  in 
order.  The  term  utility  has  been  widely  used  in  personnel  psychology  to 
refer  to  %hat  this  paper  will  call  performance  value. 

A  reasonable  argument  can  be  made  that  "utility”,  in  its  most  general 
sense,  is  a  more  appropriate  term  than  "value",  given  the  unquantifiable 
nature  of  the  "outputs"  we  are  analyzing,  we  shall  employ  the  term  "value”, 
however  for  two  reasons:  First,  to  avoid  confusion  between  our  focus  on 
role  of  performance  value  in  determining  the  optimal  distribution  of  people 
to  jobs  and  the  more  common  use  of  performance  utility  in  evaluating  the 
benefits  of  selection  and  classification  systems;  second,  to  distinguish  our 
interpretation  of  subjective  judgements  of  performance  value  from  the  way 


545 


similar  judgements  ace  used  in  most  applications  of  multi-attribute  utility 
theory  —  %fhich  seek  to  identify  the  parameters  of  individual  "utility 
functions”. 

In  these  applications,  the  approach  we  follow  —  that  is,  the  averaging 
of  individual  judgements  to  obtain  a  single  performance  value  function,  would 
require  the  \ise  of  interpersonal  ccn^risons  of  utility  that  are  prohibited 
by  the  theory  upon  which  multi-attribute  utility  is  based  (e.g.,  Keeney  and 
Raiffa,  1976).  This  restriction  does  not  apply  to  the  analysis  described 
here  because  we  do  not  treat  individual  judgements  of  performance  value  as 
"utility  functions"  but  rather  as  in^rfect  (but  randomly  distributed) 
estimates  of  a  single  organizational  "value  function". 

The  Army*s  Project  A;  Measuring  the  value  of  Job  Performance 

The  data  reported  here  were  collected  as  part  of  a  nine-year  Army 
research  effort  (Project  A)  aimed  at  improving  the  Army's  selection  and 
classification  system  for  enlisted  personnel.  The  main  purpose  of  the 
utility-measurement  conponent  of  Project  A  is  to  provide  the  information 
needed  to  maucimize  the  payoff  to  the  Army  of  inproved  selection  and  classifi¬ 
cation  procedures. 

The  Amy  research  on  performance  value  assessment  is  being  carried  out  in 
two  stages.  The  first  stage,  coopleted  earlier  this  year,  focused  on  t-he 
estimation  of  nc^specific  performance  value  functions  for  276  entry-level 
Amy  NOS  (Sadacca,  Caaf^ll,  Wise,  White,  1987).  In  the  second  stage  whicH 
has  just  begun,  the  resulting  functions,  under  different  configurations  of 
constraints,  will  be  used  to  produce  di:>,.ributions  of  performance  across  19 


546 


selected  jobs,  and  senior  Am^  officers  will  be  asked  to  eveduate  the 
resulting  distributions.  Zn  effect,  the  purpose  of  the  second  stage  is  to 
evaluate  alternative  strategies  for  inplenenting  performance  value  estimates 
in  an  optimal  job  assignment  system. 

Zn  this  section,  ve  will  briefly  summarise  the  data  collection  approach 
and  results  of  the  first  stage.  The  following  section  will  examine  the 
effects  of  using  the  resulting  value  functions  to  make  job  assignment 
decisions. 

The  first  stage  consisted  of  a  series  of  7  workshops  at  vhich  we  obtained 
jxidgements  by  74  field-grade  officers  of  the  relative  value  of  performance  at 
five  levels  in  all  entry-level  Army  MOS.  At  each  workshop,  the  performamce 
level/job  combinations  were  scaled  using  two  methods.  Zn  the  first,  each 
officer  received  one  of  seven  decks  of  cards.  Each  card  specified  a  perform¬ 
ance  percentile  and  an  NOS,  the  duties  of  which  were  briefly  described  on  the 
card.  The  officers  were  asked  to  sort  the  cards  into  six  groups  —  one  group 
representing  combinations  with  a  negative  value  and  five  representing  ordinal 
rankings  of  increasingly  valuable  conbinations.  (An  additional  and  infre¬ 
quently  used  category  was  provided  for  combinations  which  could  not  be 
judged.)  The  seven  decks  combined  spanned  the  entire  set  of  performance 
level/NOS  combinations  for  276  NOS.  Zn  addition,  each  deck  contained  '''0 
combinations  (12  NOS  x  5  performance  levels)  that  were  common  across  all 
decks.  Zn  the  second  exercise,  jixiges  provided  interval-level  estimates  ct 
the  relative  value  of  these  60  combinations.  Zn  this  exercise,  the  value  of 
a  90th  percentile  infantryman  (NOS  llB)  was  fixed  at  100.  The  officers  were 


547 


then  asked  to  scale  the  remaining  coobinations  relative  to  this  fixed  value. 
Negative  values  %«ere  allowed. 

The  performance  levels  viere  set  at  the  10th,  30th,  50th,  70th  and  90th 
percentiles,  using  the  current  (1986)  recruit  pool  as  the  reference  popu¬ 
lation.  The  instructions  specified  that  the  judgements  should  be  made  under 
an  assxaqption  that  the  world  %fas  in  a  state  of  "heightened  tensions".  Care 
was  taken  to  explain  that  the  "performance"  being  evaluated  was  multi¬ 
dimensional,  consisting  not  only  of  technical  proficiency,  but  also  personal 
discipline  and  willingness  to  work. 

The  sanqple  of  officers  providing  the  judgements  represented  a  cross- 
section  of  specialties.  A  primary  consideration  in  this  exercise  was  to 
insure  that  performance  value  judgements,  to  the  maximum  extent  possible, 
reflected  the  payoffs  of  performance  to  the  Army.  To  accomplish  this,  we 
used  as  judges  experienced  senior  officers  with  a  broad  perspective  on  those 
needs.  Furthermore,  the  effect  of  specialty  on  the  judgements  of  performance 
value  was  generally  insignificant.  The  judges'  mean  utilities  had  a  reli¬ 
ability  of  .990  on  the  interval  scale  judgements,  and  from  .958  to  .976 
across  the  six  decks  for  the  ordinal  judgements. 

To  insure  that  the  performance  value  estimates  for  all  1380  perfonnance- 
level/^lOS  combinations  were  comparable,  functions  were  estimated  for  each 
deck  to  transform  the  pile  placement  judgements  to  the  interval  scale  used 
for  the  60  common  combinations.  Table  1  contains  the  estimated  coefficient!^ 
and  statistics  for  these  functions.  The  robustness  of  the  transformation 
functions  was  checked  by  estimating  the  functions  using  40  c  vinations 


54G 


selected  fron  the  set  of  60,  and  then  regressing  predicted  against  actvial 
values  for  the  20  omitted  combinations.  The  resulting  r>square  statistic  v/as 
.945,  suggesting  that  the  transformations  should  yield  hi^y  accurate 
estimates  of  the  values  that  would  have  been  obtained  by  directly  scaling  all 
1380  combinations. 


table  1 

Dedc-Specific  Regression  equations  for  Transforming 
Average  Pile  Placement  to  Oomaan  Interval  Scale 
(Regressions  for  60  Common  Combinations) 


Independent 

Variable 

A 

B 

Deck 

C 

D 

E 

P 

Pile  Placement  (PP) 

14.00 

21.81 

43.39 

24.09 

46.99 

49.45 

Pp" 

1.455 

•,1121 

-5.785 

-1.344 

-6.922 

-6.932 

Pp 

-.0529 

.0671 

.4853 

.1692 

.5450 

.5487 

Intercept 

-34.09 

-41.75 

-69.54 

-47.74 

-63.80 

-77.85 

2 

Adjusted  R 

.965 

.926 

.954 

.944 

.912 

.924 

The  average  interval  scale  values  at  each  performance  level  were  then 
used  to  fit  a  "performance  value"  function  for  each  job.  The  functions  were 
fitted  using  stepwise  ordinary  least  squares  %^re  the  independent  variables 
were  performance  level,  its  square,  and  its  cube.  The  graphs  of  tHoee 
functions  for  nine  MOS  are  presented  in  Figure  1.  These  functions 
several  interesting  aspects  of  our  results  so  far. 

First,  for  most  MOS,  the  relationship  between  performance  level  and 
performance  value  is  a  concave  fxinction.  That  is,  the  functions  demonstrate 
diminishing  payoffs  to  increases  in  performance  as  the  preformance  level 


549 


increases.  As  ve  shall  see  in  the  next  section,  this  characteristic  of  the 
value  functions  plays  a  critical  role  in  the  context  of  optimal  assignment. 

A  second  finding  is  that  there  is  substantial  variety  in  the  shape  as 
veil  as  the  intercept  (or  "scale”)  of  the  functions  across  NOS.  One  can 
interpret  the  scale  differences  as  variations  in  the  "average"  value  of 
performance  across  jobs.  In  economic  terms,  this  variation  cw  be  inter¬ 
preted  as  variation  across  jobs  in  the  marginal  product  of  job  output  —  that 
is,  differences  in  the  rate  at  «diich  changes  in  productivity  within  a  single 
job  contribute  to  total  Amy  output.  Differences  in  the  shape  of  the 
functions  reflect  variations  in  the  way  soldier  performance  at  different 
levels  contributes  to  job  output.  One  would  expect,  for  instance,  that 
functions  that  are  relatively  "steep"  at  low  performance  levels  would  be 
associated  with  jobs  in  vi^ich  the  cost  of  errors  is  hi^;  and  that  jobs  with 
relatively  steep  slopes  at  hi^  levels  of  performance  would  be  those  in  «^ich 
the  payoffs  to  exceptional  performance  are  relatively  high. 

On  the  other  hand,  as  one  mi^t  expect  from  previous  work  in  the  area  of 
utility  generalization  there  also  a{^ar  to  be  identifiable  groups  of  NOS 
with  virtually  identical  functions.  The  task  of  identifying  these  groups  and 
examining  their  similarities  vrith  respect  to  both  the  definition  and 
prediction  of  performance  is  an  iiqx>rtant  sxibject  for  further  research  . 
Bobko,  Barren  and  Kerkar,  1987). 

PERFOMHANCZ  IN  OASSIFICATIGN  DECISIONS 

In  this  section  we  address  several  issues  associated  with  the  use  of 
performance  value  information  to  make  manpower  allocation  decisions.  First, 


550 


we  exenine  the  consequences  of  allocating  people  to  jobs  so  as  to  maximize 
the  value  of  perfonnance,  rather  than  performance  itself.  Second,  we  explore 
some  of  the  issues  associated  with  aggregation  of  perfonnance  value  across 
many  assignments.  Finally,  in  the  concltxiing  section  we  raise  the  issue  of 
how  to  determine  %Aiether  or  not  the  use  of  performance  value  information  will 
yield  better  results  than  would  be  obtained  %rithout  it. 

The  Role  of  Performance  value  in  Manpower  Allocation 

In  general,  a  policy  that  maximizes  predicted  performance  and  ignores  the 
value  of  performance  will  produce  an  allocation  that  has  the  following 
characteristics: 

(a)  the  average  level  of  perfonnance  will  be  hi^y  variable  across 
jobs; 

(b)  the  level  of  expected  perfonnance  will  tend  to  be  high  in  those  jobs 
for  which  performance  is  easiest  to  measure  and  predict; 

(c)  neither  job-specific  differences  in  the  way  mai^xswer  contributes  to 
output  nor  variations  in  the  importance  to  the  organization  of  the  output 
from  different  jobs  will  be  reflected  in  the  allocation. 

The  extent  to  v^ich  these  conditions  are  evident  in  practice  will  depend 
on  the  following  factors: 

(1)  the  distribution  of  the  performance  predictors  in  the  pocn-'lr't’'^n: 

(2)  the  degree  to  which  performance  is  differently  defined  in  different 
jobs  (that  is,  the  dimensionality  of  performance); 

(3)  the  variability  in  validities  across  jobs  and  the  relationship 
between  validity  and  job  quotas; 


551 


(4)  the  extent  to  %^lch  the  allocation  is  constrained  by  considerations 
other  than  performance. 

The  effects  of  (1)  and  (2)  are  easiest  to  explain  if  we  examine  them 
together.  If  ve  look  at  the  extremes  of  the  range  of  these  two  factors,  two 
effects  become  clear:  If  performance  is  single-dimensioned,  or  if  the 
predictors  of  job  performance  are  perfectly  correlated  in  the  population,  the 
allocation  produced  by  maximizing  expected  performance  will  be  exclusively 
determined  by  variations  across  je^s  in  the  predictability  of  performance. 

If  such  variations  do  no  exist,  then  there  will  be  a  multiplicity  of  equiva¬ 
lent  "optimal"  allocations.  At  the  other  extreme,  if  performance  is  uniquely 
defined  for  every  job,  and  the  predictors  of  performance  are  perfectly 
negatively  correlated,  then  the  allocation  resulting  from  performance 
maximization  will  be  unique  and  identical  to  the  resxilt  that  would  be 
produced  by  maximizing  any  increasing  function  of  performance.  In  other 
words,  performance  value  will  be  irrelevant  to  the  allocation  problem. 

With  respect  to  the  interactim  between  validities  and  job  quotas  noted 
in  (3),  it  is  obvious  that  the  consequences  of  variation  in  predictability 
will  become  less  pronounced  as  the  variability  decreases.  Perhaps  less 
obvious  is  the  fact  that,  if  high  validities  are  associated  with  jobs  that 
have  large  quotas,  the  effect  of  relatively  small  variations  in  validity  ' 
be  exaggerated  far  out  of  proportion  to  the  degree  of  variation. 

Finally,  the  effect  of  exogenous  constraints  (4)  is  to  narrow  the  ranq*^ 
of  feasible  allocations.  The  more  confining  these  constraints  become,  tbr 
smaller  will  be  the  difference  between  the  "best"  and  "worst"  feasible 


552 


allocations  and  thus  the  saaller  the  difference  induced  by  considerations  of 
either  predicted  perfomance  or  perfoceance  value.  This  factor  is  of 
particular  inportance  in  the  case  of  the  Amy's  allocation  problem,  vihich  is 
circumscribed  by  an  extensive  set  of  policy  and  managerial  constraints. 

These  include  not  cmly  limitations  imposed  by  force  structure  requirements 
and  the  availability  of  training  resources,  but  also  a  number  of  policy 
constraints  %fhose  purpose  is  to  insure  an  acceptable,  if  not  optimal  distri¬ 
bution  of  performance  across  jobs.  This  latter  set  of  constraints  includes 
minimum  job  entry  standards,  an  noS  priority  system,  and  a  set  of  job- 
specific  "quality  goals"  based  on  educational  attainment  and  scores  on  the 
Armed  Forces  Qualification  Test  (AFQT).  One  of  the  effects  of  these 
constraints,  vhen  they  are  used  in  optimal  assignment,  is  to  mitigate  the 
effects  of  variation  in  validity  and  job  quotas  —  producing  an  allocation  in 
which  average  performance  is  lower,  but  also  less  variable  across  jobs  than 
would  occur  without  them. 

Figure  2  demonstrates  some  of  these  effects  for  a  sample  of  Amy  nos. 

The  distribution  displayed  here  was  produced  by  assigning  a  random  san^le  of 
2232  1984  recruits  to  the  nine  jobs  so  as  to  maximize  expected  performance 
while  meeting  job  demands  (scaled  to  the  sample  size  in  proportion  to  actual 
1964  requirements)  with  recruits  who  met  the  minumum  entry  standar'rls  f'M  '  '< 
jobs  to  which  they  were  assigned.  The  optimization  used  a  simple  net'-’^^rV: 
assignment  algorithm  that  maximized  the  sum  across  all  assignments  of 
predicted  performance.  Predicted  performance  was  calculated  using  estimat''*'! 
validities  of  current  Aptitude  Area  scores  against  technical  job  performance. 


553 


Each  bar  in  Figure  2  represents  the  aean  perforaance  level  of  the 
recruits  assigned  to  that  job,  with  perforaance  level  aeasured  percentiles 
based  on  sMple  scores.  The  validity  associated  %fith  each  job  is  indicated 
at  the  tip  of  the  bar  for  that  jc^,  and  the  saaple  N's  are  listed  at  the 
botton  of  the  graph. 

The  effects  discussed  above  are  well  illxistrated  by  these  results.  The 
distribution  is  hi^y  variable  across  jobs.  An  ordering  of  the  jobs  by 
their  validities  would  yield  a  nearly  identical  list  to  that  produced  by 
ranking  average  perfomance  levels.  The  sole  exceptions  to  this  rule  are  nos 
C  and  F.  (These  two  j(^  use  different  predictors,  and  are  significantly 
different  in  size.)  Finally,  the  interaction  effect  between  validity  and  job 
quotas  can  be  seen  by  comparing  the  allocation  to  NOS  A  to  that  for  nos  C. 
nos  A,  with  a  validity  of  .66  and  a  quota  of  691,  is  assigned  recruits 
performing,  on  average,  at  about  the  70th  percentile.  noS  C,  which  uses  the 
same  predictor,  has  a  validity  only  .07  less,  but  a  quota  only  one  seventh  as 
large,  and  receives  an  allocation  performing  nearly  50  percentiles  lower. 
Aggregating  Performance  value  Across  Assigmgnts 

The  question  %«e  address  in  this  section  is  that  of  using  the  information 
obtained  in  the  exercises  described  above  to  measure  the  aggregate  value  of 
performance.  As  we  shall  see,  the  choice  of  an  approach  to  this  issue  "n i  ■ 
have  a  significant  effect  on  the  distributions  produced  when  the  value 
functions  are  used  in  an  optimal  assignment  algorithm. 

The  allocation  problem  can  be  simply  described  as  follows: 

Let  N  be  the  total  nuni}er  of  positions  to  be  filled,  N  be  the  number  of  jobs. 


554 


and  K  the  mater  of  levels  of  perforaance.  We  can  then  represent  any 
assiqnaent  of  N  individuals  to  H  jobs  fay  an  H  x  K  setrix  Q,  %tere  is  the 
mater  of  individuals  at  perforaance  level  j  assigned  to  job  i.  if  we  define 
a  k  X  1  vector  p  such  that  p^  is  the  quantity  of  perforaance  obtained  from  an 
individual  performing  at  level  i  (the  elements  of  p  ai^t  be  performance 
percentiles,  for  instance),  then  we  can  define  a  scalar  Z,  the  total  quantity 
of  perforaance  represented  by  the  allocation  Q  as  — 

2-P'Q  (1) 

That  is,  the  total  quantity  of  perforaance  represented  by  the  allocation 
Q  is  siaply  the  sum  of  the  number  of  individuals  assigned  to  each  job, 
wei^ted  by  performance  level.  This  is  the  definitim  of  aggregate  perform¬ 
ance  that  we  %d.ll  use.  However,  before  continuing,  it  should  be  noted  that 
such  a  definition  implicitly  assuaes  that  the  total  quantity  of  performance 
obtained  is  independent  of  how  perforaance  is  distributed  within  and  across 
jobs.  In  other  words,  we  are  ignoring  issues  relating  to  unit  or  group 
perforaance. 

Given  this  definition  of  aggregate  perforaance,  we  must  define  a  way  of 
applying  a  perforaance  value  function  to  the  quantity  Z;  that  is,  mi<;t 
define  a  function  v(Z)  using  the  inforaatim  obtained  in  the  value  assessm/'n^ 
exercises  described  above.  We  shall  consider  two  alternatives: 


555 


(a)  That  v(Z)  is  a  "strongly  separable”  function  of  p  and  Q  that  can  be 
%n:itten  in  the  fonn  — 

K  N 

v(2)  -  I  r  u^(p^)  (2) 

i-lj-l 

where  Uj(p^)  is  the  value  of  perforsance  at  level  i  in  job  j 

If  %ie  assume  strong  separability,  the  marginal  change  in  the  value  of 
performance  %#ith  respect  to  a  change  in  the  number  of  individuals  at  a  given 
performance  level  in  a  given  job  is  cmstant,  no  matter  how  %<e  specify  the 
function  u(p).  That  is. 


-  -  u^(p^),  for  0  <  q.^  <  N,  icK,  jcM.  (3) 

8q.  .  ^  ‘ 

(b)  That  v(Z)  is  "weakly  separable"  —  that  is 

n 

v(Z)  -  I  Uj(qjP),  where  is  the  row  of  Q.  (4) 

j-1 

By  relaxing  the  separability  assunptian,  we  allow  the  marginal  value  of 
an  additional  assignment  to  a  given  job  to  vary  with  the  total  quantity  of 
performance  in  that  job  as  well  as  %»ith  the  performance  level  of  the 
icular  assignment  being  considered: 


«v(Z} 

-  -  h.(p'q') 


(5) 


556 


The  consequences  of  this  difference  for  optimal  assignment  are  easily 
seen  by  comparing  the  maximization  pr^lems  associated  with  the  t»o  specifi¬ 
cations. 

Let  dj  represent  the  demand  (quota)  for  job  j,  and  s^^  be  the  supply  of 
applicants  (recruits)  predicted  to  perform  at  level  i.  (For  now,  we  assxjme 
that  performance  is  unidimensional  —  that  is,  each  applicant  %nll  perforin  at 
the  same  level  in  all  jobs.)  Then  the  performance  value  fxmction  defined  by 
(2)  and  (3),  produce  the  following  optimal  assignment  problems; 


K  H 


(a) 

r 

Maximize 

I  I  qij«j(Pi) 
i-1  j-1 

M 

(6) 

(b) 

Maximize 

K 

I  Uj(qjp) 

J-l 

(7) 

Subject 

to;  Z  q^j 
i-1 

-  d^,  for  all  j  e  M 

(demands) 

(8) 

M 

-  s^,  for  all  i  e  K 

( svqjplies ) 

(9) 

The  equation  systems  defined  by  (a)  and  (b)  can  be  transformed  into  single 
equations  xising  the  method  of  Lagrange  as  follows: 


(a) 


or 


(b) 


KM  n  K  K  n 

L-I  I  qi^u^(Pi)*ErjIqi^-d..E«,  lq,.-s. 
i-1  j-1  j-1  i-1  i-1  j-1 


K 

L  •,£ 


H  n  K  K  N 

ill  jli  U.(p'q5)  ^iliTj^Eiqij-d. 


(10) 


(11) 


557 


vihere  Yj  and  are  sets  of  Lagrangian  nultipliers  associated  «d.th  the  demand 
and  su^ly  constraints. 

The  conditims  for  a  maxinum  of  (a)  %rill  be  easier  to  describe  if  vie 
order  the  values  of  Uj(p^)  so  that  the  following  is  true: 

If  j'  >  j  then  Uy(p^)  <  Uj(p^) 

and  if  i'  >  i  then  Uj(p^,)  ^  Uj(p^) 

Then  the  matrix  of  assignments  Q*  that  mzucimizes  (a)  will  contain 

elements  q.  .  that  meet  the  following  condition: 

^  J 

i-1  j-1 

-WAX  (Sj  -  tq^j,  d.  -  (12) 

k-l  "-1 

In  other  words,  the  maximum  will  be  achieved  by  following  the  single  rule  of 
"top-down"  assignment:  Order  the  set  of  possible  person- job  matches  from 
those  with  the  hipest  value  to  those  with  the  lowest;  then  assign  individ¬ 
uals  at  the  highest  available  level  of  performance  to  the  position  with  the 
hi^st  value  at  that  level  of  performance  until  either  the  demand  is  met  or 
the  svqpply  is  exhausted.  The  resulting  allocation  will  be  the  one  that 
maximizes  the  variance  in  performance  value  across  jobs. 

The  necessary  (first  order)  conditions  for  a  Q*  that  maximizes  (b),  th'' 
weakly  separable  case,  can  be  stated  as  follows: 


558 


Q*  ■  1^1  such  that 


6L 


(i) 


qijtqj  sqij 


-  Tj  -  %  -  0,  for  all  i,J 


(13) 


(ii) 


SL 


6ri 


m  s-  -  £  q<  • 
‘  j-i 


0,  for  all  i 


(14) 


SL 


(iii) 


‘*1 


-  d .  “  £  qr .  -  0,  for  all  j 
’  i-i  ’ 


(15) 


the  solution  of  this  system  implies  that,  if  the  functions  are  continuous, 
everytdiere  twice  differentiable,  and  convex,  there  will  exist  a  unique 
optimal  solution  that  is  characterized  by  the  following: 


&Uj(  )/&q^j 
‘“k'  '/‘’ij 
tu^(  )/8q^.| 

«U.(  )/iq„. 


4u.(  )/5q„^ 

— J - for  all  iftm,  j^, 

*“k' 


iu^(  )/Sq.. 

-  for  all  i^,  j^. 

*“k< 


(16) 


(17) 


That  is,  at  ^tinality,  the  marginal  rates  of  svibstitution  across  jobs  for 
the  same  perfonneuice  level  will  be  the  same  for  all  pairs  of  jobs 
performance  levels,  as  will  the  marginal  rates  of  substitution  among  diffe'- 
ent  performance  levels  within  jobs. 

If  it  is  reasonable  to  assiane  that  the  judgements  obtained  in  the  Proi''  ^ 
A  utility  viorkshops  are  valid,  at  least  over  a  limited  range,  then  the 
generally  curvilinear  functions  displayed  in  Figure  1  will,  under  weak 


559 


separability,  produce  an  optimal  allocation  that  is  not  a  "corner  solution"  - 
>  that  is  the  maximization  of  performance  value  will  tend  to  allocate  some 
high-level  performers  to  all  jobs.  This  will  occur  because  the  variations  in 
marginal  value  inplied  by  the  non-constant  slopes  of  the  curves  will  tend  to 
produce  the  equalities  in  (16)  and  (17)  at  values  of  that  are  neither  0 
nor  maximal.  The  result  is  that,  even  in  NOS  with  very  steep  average  slopes, 
there  will  exist  some  point  at  iihich  the  payoff  to  an  additional  assignment 
in  this  NOS  is  exceeded  by  that  in  another  NOS  with  a  lower  average  perform¬ 
ance  value. 

Figures  3  and  4  illustrate  the  effects  of  weak  versus  strong  separ¬ 
ability  on  the  distribution  of  performance  for  the  same  saople  represented  in 
Figure  2.  All  three  figures  were  obtained  using  the  same  supplies,  demands 
and  constraints.  The  only  differences  are  in  the  objective  functions  that 
were  maximized. 

Figure  3  presents  the  distribution  produced  tdien  strongly  separable  value 
functions  are  used  to  maximize  the  aggregate  value  of  performance. 

Comparing  this  result  to  Figure  2,  Me  can  see  that  the  variability  of  the 
distribution  is  increased  by  the  consideration  of  performance  value,  although 
the  variation  is  less  tightly  linked  to  differences  in  validity.  The  effert 
of  the  interaction  between  validity  and  quota,  however,  is  markedly  i 

in  the  case  of  NOS  A  ar^  C.  The  difference  between  NOS  B  and  H,  however, 
exaggerated  by  the  use  of  utility.  This  occurs  because  NOS  B  has  both  a  In¬ 
validity  and  a  relatively  flat  value  function,  while  the  reverse  is  true  for 
NOS  H. 


560 


Figure  4  displays  the  results  %ihen  %ieak  separability  is  assumed.  Inter¬ 
job  variability  is  markedly  reduced,  %rith  the  most  noticable  difference  being 
that  nos  B  receives  a  substantially  increeised  level  of  performance. 

Summary  and  Oonclusions 

The  results  pictured  in  Figures  2-4  provide  ample  evidence  that  varia¬ 
tions  in  performance  value  can  make  substantial  differences  in  manpower 
distributions.  The  question  we  shall  briefly  address  in  this  section  is  that 
of  determining  udiether  or  not  a  given  approach  %n.ll  yield  results  that  are 
"better"  than  the  resiilts  produced  by  alternative  approaches. 

A  general  argument  can  be  made  to  the  effect  that: 

(1)  The  evidence  of  current  practice,  augmented  by  data  from  preliminary 
workshops  to  assess  different  distributions  strongly  suggests  that  perfor¬ 
mance  value  must  be  considered  in  the  assignment  process. 

(2)  the  procedures  currently  used  to  insure  acceptable  distributions  may, 
in  the  current  environment,  produce  results  that  are  quite  good,  but  these 
procedures  have  three  flaws  (a)  they  are  based  on  predictors  of  performance 
rather  than  predicted  performance;  (b)  they  are  slow  to  adapt  to  changes  in 
Amy  needs  and  the  operating  environment;  and  (c)  they  are  difficult  to 
explain  and/or  defend  to  interested  parties  such  as  DOD  and  Cemgress. 

(3)  Carefully  estimated  value  functions  are  useful  because  they  ar<^  < 
more  adaptable  to  changing  circumstances  (in  doctrine,  technology,  recruitina 
envirmment,  etc)  than  are  the  heuristics  and  political  mechanisms  currenHy 
used  to  control  distributions;  and  (b)  more  "rational",  and  thus  easiei  t'-' 
describe  to  others. 


561 


Beferences 


Bobko,  P.,  Karren,  R.,  &  Kerkar,  S.  P.  (1987).  Systematic  research  needs  for 
understanding  supervisory-based  estimates  of  SDy  in  utility 
analyses .  Organizational  Behavior  and  Human  Decision  Processes , 

69,  69-95. 

Brogden,  H.  (1959).  Efficiency  of  classification  as  a  fijnction  of  number  of 
jobs,  percent  rejected,  and  the  validity  and  intercor relation  of 
job  performance  estimates.  Educational  and  Psychological  Measure¬ 
ment,  9,  181-190. 

Cascio,  W.  F.  (1987).  Costing  hianan  resources;  The  financial  impact  of 
behavior  in  organizations  (2nd  ed.).  Boston:  Kent. 

Eaton,  N.  K.,  Wing,  H.,  &  Mitchell,  K.  J.  (1985).  Alternate  methods  of 
estimating  the  dollar  value  of  performance.  Personnel  Psychology, 

38,  27-40. 

Hunter,  J.  E.  &  Schmidt,  F.  (1982).  Fitting  people  to  jobs:  the  impact  of 
personnel  selection  on  national  productivity.  In  Human  Performance 
and  Productivity:  Human  Capability  Assessment.  Hillsdale  New 
Jersey:  Erlbaum. 

Keeney,  R.  L.  &  Raiffa,  4.  (1976).  Decisions  with  multiple  objectives: 
Preferences  and  v:*lue  tradeoffs.  New  York:  Wiley. 

Sadacca,  R. ,  Can^^ll,  j.,  wise,  L.,  &  White,  L.  (April,  1987).  Perform? 
composites,  performance  utility,  and  selection/classification 
decisions,  paper  Presented  at  the  second  annual  conference  of  the 
Society  of  Industrial  arei  Organizational  Psychology.  Atlanta,  GA. 


562 


Schmidt,  F.,  4  Hunter,  J.  E.,  CXiterbridge,  A.  N.,  4  Trattner,  M.  H.  (1986). 
Ihe  econooiic  impact  of  job  selection  methods  on  size,  productivity, 
and  payroll  costs  of  the  Federal  work  force:  An  enpirically-based 
demonstration.  Personnel  Psychology,  39,  1-29. 


563 


Pepf onmance  Utility  Funct i ons  for  N i ne  Array  MOS 


88?®f  §8?“?  88?®f 

”  ft+m+n  ’  ft+iii+n 


8  8  8  **  f 
fl+m+n 


8  8  8^8 
fl+ni+n 


8  8  8  ®  8 


8  8  8  **  f 
fUiiiin 


564 


P«rf( 


r.R«;r.nTii.H 


PERFORMANCE  DISTRIBUTION  FOR  9  MOS 
IGNORING  PERFORMANCE  UTILITY 


Figurt  2.  Exptetad  M«n  parfenanet  raaultinf  frev  asilgnaanc 

to  •axiaisa  parfoxBaaea.  Validity  eoafflelaata  In  aaeh 
MOS  ara  ahoan  abeva  tha  bar. 


MKAM  PI-RFORMAHCE  PCTrENTTLB 


PERFORMANCE  DISTRIBUTION  FOR  9  M OS 
STRONGLY  SEPARABLE  UTILITY  FUNCTION 


HOS 


Figurt  3.  Exptcttd  Man  parforunea  rasulting  froa  assignaant 
to  aaxlalsa  tha  valua  of  parforaanea  aaauaing 
strongly  aaparabla  valua  functions. 


566 


MKAM  rCKPORNANCE  PERCCimU! 


PERFORMANCE  DISTRIBUTION  FOR  O  MOS 
WEAKLY  SEPARABLE  UTILITY  FUNCTION 


HOS 


Flgurt  4.  Cxpcettd  ■««»  ptrferBanct  rciultlng  froa  stslgnatat 

*0  ■•*i*l*s  ths  aggrcgatt  valua  of  parfenanet  aaaualng 
waakly  aaparabla  vaXua  fuaetloDs. 


567 


DEVELOPING  BEHAVIORAL  RATING  SCALES  TO  EVALUATE 
SECOND-TOUR  PERFORMANCE  IN  THE  ARMY 


Elaine  D.  Pulakos 
Mary  Ann  Hanson 
Walter  C.  Borman 
Glenn  Hal  1am 
Gary  Carter 
Cynthia  OMens-Kurtz 

Personnel  Decisions  Research  Institute 


Presented  on  Symposium, 

"Junior  Noncommissioned  Officer  Job  Requirements: 
Where  Does  Leadership  Fit  In?" 

At  the  Annual  Convention  of  the 
American  Psychological  Association 
New  York 

August  1987 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


569 


Developing  Behavioral  Rating  Scales  to  Evaluate 
Second  Tour  Performance  in  the  Army 

Abstract 

Using  the  critical  incidents  or  behavioral  analyses  method.  Army¬ 
wide  and  NOS-specific  performance  requirements  were  identified  for  second 
tour  U.S.  Army  soldiers  in  nine  representative  occupational  specialties. 
Based  on  these  performance  incidents,  behavioral  categories  or  dimensions 
were  developed  for  evaluating  second  tour  performance  effectiveness. 
Results  of  both  the  MOS-specific  and  Army-wide  scale  development 
processes  suggested  that  second  tour  soldiers  perform  most  of  the  work 
that  first  tour  soldiers  perform  and  also  supervise  that  work. 

Discussion  focuses  on  the  shortened  set  of  procedures  used  to  revise 
first  tour  MOS-specific  rating  scales  to  measure  second  tour  performance 
and  the  nature  of  these  first-line  supervisor  jobs  in  relation  to  the 
Importance  of  technical  and  supervisory  duties  for  performing  effectively 


570 


Developing  Behavioral  Rating  Scales  to  Evaluate 
Second  Tour  Perforaance  in  the  Army 

This  paper  describes  the  procedures  used  to  develop  MOS-specific  and 
Army-wide  behavioral  rating  scales  for  evaluating  the  performance  of 
second  tour  Army  enlisted  personnel.  These  scales  were  developed  as  part 
of  the  Project  A  effort  to  evaluate  the  validity  of  current  and  new 
predictors  of  soldier  performance.  The  Project  A  research  is  being 
conducted  on  19  Army  Jobs  (Military  Occupational  Specialties  or  MOS), 
carefully  selected  to  be  representative  of  the  entire  population  of  Army 
NOS. 

A  primary  goal  of  Project  A  is  to  increase  Army  organizational 
effectiveness  by  improving  the  Job-soldier  match.  As  an  important  step 
towards  this  goal,  a  comprehensive  set  of  selection  and  classification 
measures  (predictors)  was  developed  and  extensively  field  tested 
(Peterson,  1986).  In  addition,  a  diverse  and  comprehensive  first  tour 
criterion  development  effort  was  undertaken.  Time  and  cost  limitations 
dictated  that  Job-specific  criterion  measures  be  developed  for  Just  nine 
of  the  target  NOS.  These  Job-specific  criterion  measures  included  hands- 
on,  task  proficiency  measures.  Job  knowledge  tests  (Campbell,  Campbell, 
Rumsey,  &  Edwards,  1986),  and  NOS-specific  behavior-based  rating  scales 
(Toquam,  McHenry,  Corpe,  Rose,  Lammlein,  Kemery,  Borman,  Mendel,  & 
Bosshardt,  1986).  To  provide  criterion  i^asurement  for  the  remaining  10 
Jobs,  Army-wide  rating  scales  applicable  for  evaluating  first  tour 
soldier  effectiveness  In  any  NOS  were  developed  (Borman,  Notowidlo,  Rose, 
&  Manser,  1987;  Pulakos  &  Borman,  1986). 

During  the  summer  and  fall  of  1985,  a  large-scale  Concurrent 
Val’-’-tlon  was  conducted,  during  which  the  predictor  and  criterion 


571 


Measures  were  administered  to  several  thousand  soldiers  in  19  target 
jobs.  Then,  starting  in  the  late  summer  of  1986,  a  Longitudinal 
Validation  data  collection  was  begun  in  which  all  measures  are  being 
administered  to  approximately  50,000  soldiers  in  a  predictive  validity 
design.  In  addition  to  validating  the  predictors  against  first  tour  job 
performance,  the  measures  will  also  be  validated  against  second  tour  job 
performance,  for  those  Individuals  In  the  sample  who  reenlist  in  the 
Army. 

In  this  paper,  we  describe  the  procedures  used  to  develop  Behavioral 
Summary  Scales  (Borman,  1979)  for  evaluating  second  tour  performance. 
Performance  requirements  for  second  tour  U.S.  Army  soldiers  were  a  priori 
thought  to  Include  both  technical  and  supervisory  components.  That  is, 
we  believed  that  second  tour  soldiers  were  responsible  for  performing 
most  of  the  technical  work  required  of  first  tour  soldiers  ^nd  also 
supervising  that  work.  Accordingly,  technical  competence  dimensions  as 
well  as  supervisory  effectiveness  dimensions  would  likely  have  to  be 
Incorporated  into  the  second  tour  performance  measures.  However,  the 
extent  to  which  supervision  is  an  Important  part  of  the  second  tour 
soldier’s  job  was  thought  to  vary  across  the  different  MOS.  This 
suggested  the  possibility  that  some  supervisory  measures  might  be  Army- 
wide  and  thus  applicable  to  all  NOS,  while  other  supervisory  measures 
might  be  MOS-specific  and  thus  only  relevant  to  a  particular  job. 

Development  of  Second-Tour  Armv-Wide  Rating  Scales 
Method  and  Results 

Behavior  Analysis  Workshops 

Sample.  One-hundred  and  seventy-two  officers  and  NCOs  participated 
In  half-day  workshops  Intended  to  elicit  behavioral  examples  of  second- 
tour  soldier  effectiveness.  The  workshops  were  conducted  at  Ft.  Bragg, 


572 


NC  and  Ft.  Carson,  CO.  The  sample  consisted  of  154  males  and  18  females; 
136  were  officers  and  36  were  NCOs.  These  individuals  reported  having  an 
average  of  6.29  years  in  the  Army  and  an  average  of  5.09  years  occupying 
supervisory  positions. 

Procedure.  The  Inductive  behavioral  analysis  strategy  (Campbell, 
Ounnette,  Arvey,  &  Hellervick,  1973)  requires  persons  familiar  with  a 
job’s  performance  demands  to  generate  examples  of  effective,  mid-range, 
and  ineffective  behavior  observed  on  that  job.  In  the  present  applica¬ 
tion,  job  behavior  was  defined  broadly  as  any  action  related  to  soldier 
effectiveness,  and  workshops  were  conducted  in  which  participants  were 
asked  to  generate  behavioral  examples  of  what  they  considered  to  be  the 
second-tour  soldier  effectiveness  domain. 

A  total  of  1,000  behavioral  examples  were  generated  from  the 
workshops.  These  incidents  were  edited  to  a  common  format  and  then 
content  analyted  to  form  12  preliminary  dimensions  of  second  tour  Army¬ 
wide  effectiveness.  The  performance  categories  that  had  been  developed 
for  the  first  tour  soldiers  were  replicated  for  the  second  tour  soldiers. 
In  addition,  three  generic  supervisory  dimensions  emerged  from  the 
content  analysis  of  the  incidents.  Thus,  categorization  of  the  perfor¬ 
mance  examples  suggested  that  second  tour  soldiers  do,  in  fact,  perform 
most  of  the  work  that  first  tour  soldiers  perform  in^  also  supervise  that 
work.  The  12  second  tour  performance  dimensions  were  as  follows: 

A.  Displaying  Technical  Knowledge/Skill 

B.  Displaying  Effort,  Conscientiousness,  and 
Responsibility 

C.  Organizing,  Supervising,  Honitoring,  and 
Correcting  Subor.'i^ nates  (supervisory  dimension) 


573 


D.  Training  and  Developing  (supervisory  dliaenslon) 

E.  Showing  Consideration  and  Concern  for  Subor-- 
dlnates  (supervisory  dimension) 

F.  Following  Regulatlons/Orders  and  Displaying 
Proper  Respect  for  Authority 

G.  Maintaining  Own  Equipment 

H.  Displaying  Honesty  and  Integrity 

I.  Maintaining  Proper  Physical  Fitness 

J.  Developing  Own  Job/Soldiering  Skills 

K.  Maintaining  Proper  Military  Appearance 

L.  Controlling  Own  Behavior  Related  to  Personal 
Finances,  Drugs/Alcohol ,  and  Aggressive  Acts 

Retranslation  of  the  Behavioral  Examples 

Sample.  The  retranslation  judges  were  a  different  group  of 
individuals  than  those  who  generated  the  critical  incidents.  This  sample 
consisted  of  45  NCOs  and  36  officers.  There  were  59  males  and  22 
females.  The  average  time  in  the  Army  for  the  sample  was  8.53  years  and 
the  average  amount  of  supervisory  experience  was  4.75  years.  The 
retranslatlon  workshops  were  conducted  at  Ft.  Knox,  KY. 

Procedures.  Retranslation  provides  a  way  of  checking  on  the  clarity 
of  Individual  behavioral  examples  and  of  the  dimension  system.  This  is 
accomplished  by  asking  persons  familiar  with  the  target  domain  to  make 
two  judgments  about  each  behavioral  example:  the  dimension  or  category 
to  which  It  belongs  based  on  Its  content  and  the  effectiveness  level  it 
reflects.  Examples  where  disagreement  occurs  in  either  category  member¬ 
ship  or  rated  effectiveness  level  may  be  unclear  and  should  be  revised  or 
eliminated  from  further  consideration.  Also,  confusion  between  two  or 


574 


■ore  categories  In  the  sorting  of  several  examples  may  reflect  a  poorly 
formed  category  system. 

To  accomplish  the  retranslatlon  task.  Judges  were  provided  with 
definitions  of  the  12  dimensions  to  aid  In  the  sorting  of  behavior 
examples  Into  categories  and  a  7-po1nt  effectiveness  scale  (where  1  - 
extremely  Ineffective,  4  -  average,  and  7  •  extremely  effective)  to  guide 
the  effectiveness  ratings.  Further,  the  retranslatlon  task  was  divided 
Into  four  subtasks,  each  requiring  a  retranslatlon  Judge  to  evaluate  200 
behavioral  Incidents.  This  division  Into  subtasks  was  accomplished  to 
keep  reasonable  the  amount  of  time  each  Judge  would  need  to  spend  on  the 
retranslatlon  task. 

Results.  An  Initial  screening  of  the  data  was  undertaken  to 
Identify  and  delete  potential  random  responders  or  Individuals  who 
obviously  did  not  understand  the  retranslatlon  task.  Specifically, 
respondents  were  scored  on  12  critical  Incidents,  each  of  which  the 
research  staff  believed  were  very  straightforward  to  classify  Into  one  of 
the  12  dimensions.  If  respondents  did  not  correctly  categorize  at  least 
50%  of  these  Incidents,  they  were  deleted  from  the  sample.  Seven 
respondents  out  of  the  81  total  respondents  were  omitted  from  the  sample 
using  this  criterion,  leaving  a  total  sample  size  of  74  for  the 
retranslatlon  analyses  reported  below. 

Table  1  shows  the  number  of  behavioral  examples  that  were  reliably 
retranslated  for  each  of  the  12  dimensions.  The  acceptance  points  for 
retaining  an  example  were  greater  than  50  percent  for  sorting  the  example 
Into  a  single  dimension,  and  less  than  1.50  standard  deviation  for  the 
effectiveness  rating.  These  criteria  left  734  of  the  1,000  examples 
(73.4%)  to  be  Included  for  subsequent  scale  development  efforts.  It 
should  be  noted  that  the  retranslatlon  results  Indicated  that  all  12  of 


575 


the  dinenslons  that  resulted  from  the  Initial  categorization  of  the 
Incidents  should  be  retained. 

Arwv-Wide  Scale  Development 

The  results  in  Table  1  are  satisfactory  in  that  sufficient  numbers 
of  reliably  retranslated  examples  are  available  to  develop  behavioral 
summary  statement  anchors  for  each  dimension.  Typically,  a  minimum  of  20 
reliably  retranslated  examples  that  are  not  highly  overlapping  in  content 
is  considered  sufficient  for  defining  a  dimension. 

We  are  presently  in  the  process  of  developing  these  behavioral 
summary  statement  anchors  for  each  Army-wide  performance  dimension.  For 
each  dimension,  the  reliably  retranslated  examples  will  be  divided  into 
three  categories  of  effectiveness  levels:  low  (1  -  2.49),  average  (2.50 
-  5.49),  and  high  (5.50  -  7).  Behavioral  summary  statements  will  then  be 
written  to  capture  the  content  of  the  specific  examples  at  these  three 
effectiveness  levels. 

Development  of  the  behavioral  summary  statements  is  the  critical 
step  in  forming  Behavioral  Summary  Scales.  The  main  advantage  of  these 
scales  over  behaviorally  anchored  rating  scales  is  that,  for  a  particular 
dimension  and  effectiveness  level,  the  content  of  lU  of  the  reliably 
retranslated  examples  is  represented  on  the  scales,  not  just  one  or  two 
of  the  specific  behavioral  examples  (Borman,  1979).  Accordingly,  it  is 
■ore  likely  that  a  rater  using  the  scales  will  be  able  to  match  observed 
perfoneance  with  the  performance  descriptions  that  appear  on  the  scales. 
Development  of  Second  Tour  HOS-Soecific  Rating  Scales 

Second  tour  NOS-specific  rating  scales  were  develop--*  for  nine  jobs: 


576 


infantryman  (IIB),  cannon  crewmember  (13B),  tank  crewmember  {19E), 
single-channel  radio  operator  (31C),  light  wheel  vehicle  mechanic  (63B), 
motor  transport  operator  (64C),  administrative  specialist  (71L),  medical 
specialist  (91A/B),  and  military  police  (95B).  The  approach  used  for 
developing  these  rating  scales  differed  from  the  approach  used  to  develop 
the  second  tour  Army-wide  rating  scales.  Whereas  the  second  tour  Army¬ 
wide  rating  scales  were  developed  using  the  entire  sequence  of  behavioral 
summary  scale  procedures,  development  of  the  second  tour  NOS-specific 
rating  scales  involved  revising  the  first  tour  HOS-specific  rating  scales 
so  that  they  would  be  appropriate  for  evaluating  second  tour  performance. 
Behavior  Analysis  Workshops 

Sample.  A  behavior  analysis  workshop  was  conducted  with  officers 
and  NCOS  in  each  of  the  nine  target  jobs  to  generate  examples  of 
effective,  average,  and  ineffective  second  tour  MOS-specific  job  perfor¬ 
mance.  Approximately  25  individuals  participated  in  each  workshop.  The 
participants  had  an  average  of  8.42  years  in  the  Army  and  an  average  of 
5.78  years  of  supervisory  experience.  The  workshops  were  conducted  at 
Ft.  Knox,  KY,  Ft.  Bragg,  NC,  Ft.  Carson,  CO,  Ft.  Sam  Houston,  TX,  Ft. 
Gordon,  GA,  and  Ft.  Hood,  TX. 

Procedure.  The  same  procedures  used  to  generate  the  Army-wide 
behavior  examples  were  used  in  the  MOS-specific  behavior  analysis 
workshops.  However,  rather  than  writing  examples  that  would  be 
applicable  to  any  MOS,  participants  were  instructed  to  write  behavior 
examples  that  were  specific  to  the  particular  job  for  which  they  were 
writing  incidents.  The  numbers  of  behavioral  examples  generated  for  each 
MOS  were  as  follows:  IIB  (161  examples),  13B  (58  examples),  19E  (236 
examples),  31C  (212  examples),  63B  (180  examples),  64C  (184  examples), 

71L  (149  examples);  91A/B  (206  examples),  and  95B  (234  examples). 


577 


Comparison  of  First  Tour  and  Second  Tour  Behavior  ExampUs 

The  behavior  Incidents  were  first  edited  to  a  common  format.  Then, 
they  were  categorized  for  each  job  using  the  first  tour  NOS-specIfIc 
category  system  as  a  starting  framework.  If  a  second  tour  Incident  did 
not  "fit"  Into  an  already  existing  first  tour  category,  an  entirely  new 
category  was  Introduced.  Through  this  process,  It  was  possible  to 
determine  whether  the  same  or  different  categories  should  be  used  for 
evaluating  second  tour  performance  as  were  used  to  evaluate  first  tour 
performance.  This  exercise  also  yielded  Information  regarding  what 
specific  category  additions  or  deletions  were  necessary  to  comprehen¬ 
sively  tap  the  second  tour  performance  domain. 

Almost  all  of  the  first  tour  HOS-specIfIc  rating  categories  were 
replicated  In  some  form  for  the  second  tour  jobs.  For  each  category  that 
was  both  a  first  tour  and  a  second  tour  dimension,  the  next  step  was  to 
examine  the  content  of  the  incidents  to  determine  whether  or  not  the 
performance  requirements  were  appreciably  different  for  Second  tour  than 
for  first  tour  soldiers.  This  was  an  Important  step  because  although  the 
names  of  the  performance  dimensions  for  first  and  second  tour  soldiers 
might  be  the  same,  It  was  at  least  possible  that  the  dimension 
definitions  or  anchors  might  need  to  be  modified/revised  In  order  to  make 
the  scales  appropriate  for  evaluating  second  tour  performance. 

For  some  dimensions,  comparisons  of  the  first  and  second  tour 
behavior  Incidents  Indicated  that  more  was  expected  of  second  tour 
soldiers  performing  at  the  average  or  high  levels  of  performance  than  was 
expected  of  their  first  tour  counterparts.  In  other  cases,  low  level 
performance  for  a  first  tour  soldier  seemed  "too  low”  for  Individuals  in 
their  second  tour.  Under  such  circumstances,  the  summary  statement 


578 


anchors  were  modified  to  reflect  the  appropriate  performance  standards. 
For  other  dimensions,  the  Incidents  suggested  that  second  tour  soldiers 
were  responsible  for  knowing  how  to  operate  and  maintain  more/different 
pieces  of  equipment  than  were  the  first  tour  soldiers.  Again,  this  type 
of  difference  was  incorporated  into  the  second  tour  summary  statements. 

For  several  of  the  NOSs,  the  second  tour  incidents  also  suggested 
that  some  new,  MOS-specific  supervisory  categories  should  be  developed. 
Accordingly,  preliminary  summary  statement  anchors  were  written  for  these 
supervisory  dimensions.  In  developing  the  categories,  however,  care  was 
taken  not  to  duplicate  the  Army-wide  supervision  categories,  which  would 
be  used  to  evaluate  individuals  in  all  MOSs.  That  is,  if  the  supervisory 
incidents  reflected  the  same  types  of  behaviors  that  were  already  being 
tapped  by  the  Army-wide  supervisory  dimensions,  then  no  MOS-specific 
supervisory  dimensions  were  developed.  Thus,  the  MOS-specific  categories 
reflected  aspects  of  supervision  that  were  relevant  only  to  the 
particular  job  in  question.  The  names  of  the  second  tour  supervisory 
performance  dimensions  by  MOS  are  shown  in  Table  2.  As  it  can  be  seen 
from  the  table,  MOS-specific  supervisory  dimensions  were  developed  for 
five  of  the  nine  MOSs. 

Scale  Revision  Workshops 

Samol e .  For  each  MOS,  two  scale  revision  workshops  were  conducted 
with  10-14  participants  In  each.  These  Individuals  were  different  from 
those  who  generated  the  behavior  examples.  Approximately  half  of  the 
participants  were  officers  and  the  other  half  were  NCOs.  The  sample 
reported  an  average  of  5.86  years  In  the  Army  and  an  average  of  3.43 


579 


years  of  supervisory  experience.  The  scale  revision  workshops  were 
conducted  at  Ft.  Bragg,  NC  and  Ft.  Carson,  CO. 

Procedure.  The  purpose  of  the  scale  revision  workshops  was  to  have 
subject  Matter  experts  review  the  proposed  second  tour  performance 
categories  and  make  any  revisions  to  the  dimension  definitions  and 
anchors  that  were  necessary  to  make  the  scales  appropriate  for  evaluating 
second  tour  MOS-specific  performance.  Participants  were  told  that  three 
focal  questions  needed  to  be  addressed  during  the  workshops: 

.  Do  the  dimension  anchors  contain  material  that  is  not  relevant 
for  evaluating  second  tour  soldier  effectiveness? 

.  Do  the  dimension  anchors  for  various  levels  of  effectiveness 

t 

accurately  reflect  what  would  be  expected  of  a  second  tour 
soldier  performing  at  the  ineffective,  average,  and  effective 
levels  of  performance? 

.  Do  the  proposed  dimensions  tap  all  of  the  NOS-specific 
performance  components  of  the  second  tour  soldier's  job? 

To  answer  these  questions,  the  workshop  leader  reviewed  each 
dimension  In  detail  with  the  workshop  participants.  One  by  one,  the 
three  summary  statement  anchors  describing  ineffective,  average,  and 
effective  performance  for  each  dimension  were  discussed.  Participants 
were  asked  to  think  about  second  tour  performance  expectations  and 
recommend  any  changes  that  they  deemed  necessary  to  make  the  scales 
maximally  relevant  for  evaluating  second  tour  soldiers. 

Based  on  the  input  from  the  workshop  participants,  the  scales  were 
revised.  In  most  cases,  only  minor  wording  changes  were  made  to  the 
summary  statements.  In  a  few  cases,  however,  the  dimensions  themselves 
as  well  as  their  anchors  were  changed  substantially.  Substantial  changes 


580 


to  the  dimensions  were  usually  a  result  of  the  job  requirements  having 
actually  changed  since  the  time  the  first  tour  scales  were  developed  and 
the  second  tour  behavior  Incidents  were  collected.  Workshop  participants 
made  a  final  review  of  the  proposed  changes  to  the  rating  scales  before 
being  dismissed. 

Sample.  For  each  M)S,  a  retranslatlon  workshop  was  conducted  with 
approximately  20  officers  and  NCOs.  The  total  number  of  Individuals 
participating  In  the  retranslatlon  workshops  across  all  HOS  was  193. 
Workshop  participants  were  again  different  from  those  who  generated  the 
critical  Incidents  and  those  who  reviewed  and  revised  the  proposed  second 
tour  rating  scales.  For  this  sample,  the  average  time  In  the  Army  was 
7.34  years  and  the  average  amount  of  supervisory  experience  was  3.96 
years.  Retranslatlon  workshops  were  conducted  at  Ft.  Carson,  CO  and  Ft. 
Lewis,  WA. 

Procedure .  The  purpose  of  the  retranslatlon  workshops  was  to  check 
on  the  Intended  effectiveness  levels  of  the  behavioral  summary  statements 
anchoring  each  MOS-specIfIc  performance  dimension  as  well  as  to  check  on 
the  dimension  structures  themselves.  It  Is  Important  to  clarify  that 
rather  than  retranslating  Individual  behavior  examples  (as  was  the  case 
with  the  Army-wide  retranslatlon  workshops  described  above),  participants 
were  asked  to  retranslate  the  actual  summary  statements  that  would  be 
used  to  anchor  the  rating  scale  dimensions. 

Recall  that  there  were  three  summary  statements  anchoring  each 
dimension:  one  describing  low  level  or  Ineffective  performance,  one 
describing  middle  level  or  average  performance,  and  one  describing  high 
level  or  effective  performance.  Participants  were  provided  with 
definitions  of  each  dimension  and  a  booklet  containing  the  summary 


581 


statements  listed  In  a  random  order.  They  were  asked  to  make  two 
Judgments  about  each  summary  statement:  the  dimension  or  category  to 
which  It  belonged  based  on  Its  content  and  the  effectiveness  level  It 
represented  from  1  (very  Ineffective)  to  7  (very  effective).  The  number 
of  dimensions  for  the  different  HOS  ranged  from  a  minimum  of  seven  (for 
the  71L*s)  to  a  maximum  of  14  (for  the  95B*s).  Thus,  Judges  were 
required  to  make  from  21  to  42  Judgments  for  this  retranslation  task. 

Results.  Again,  an  Initial  screening  of  the  data  was  undertaken  to 
identify  and  delete  potential  random  responders  or  Individuals  who 
obviously  did  not  understand  the  retranslatlon  task.  For  each  MOS, 
respondents  were  scored  on  approximately  10  critical  Incidents  each  of 
which  the  research  staff  believed  were  very  straightforward  to  classify 
Into  one  of  the  performance  dimensions.  If  respondents  did  not  correctly 
recategorize  at  least  50%  of  their  Incidents,  they  were  deleted  from  the 
sample.  Of  the  193  total  participants  in  the  retranslatlon  workshops,  22 
were  excluded  from  the  retranslatlon  analyses  reported  below. 

For  almost  all  (98%)  of  the  summary  statements  for  all  of  the  nine 
NOSs,  at  least  half  of  the  retranslatlon  sample  placed  them  In  the 
Intended  category,  and  for  92%  of  the  statements,  more  than  75  percent  of 
the  sample  categorized  them  as  Intended.  The  mean  effectiveness  level 
was  also  very  close  to  the  Intended  effectiveness  level  for  most  of  the 
summary  statements.  That  Is,  If  the  statement  was  Intended  to  be  a  low 
level  or  Ineffective  anchor.  Its  mean  effectiveness  level  was  about  a  1. 
For  those  Intended  to  be  a  middle  level  or  average  anchor,  the  mean 
effectiveness  level  was  about  a  4,  and  for  those  Intended  to  be  a  high 
level  or  effective  anchor,  the  mean  effectiveness  level  was  about  7. 


582 


There  were  a  few  statements  (about  14%  across  all  NOS),  however,  for 
which  there  was  some  discrepancy  between  the  mean  effectiveness  level  and 
the  Intended  effectiveness  level  (i.e.,  the  effectiveness  rating  was  more 
than  one  point  away  from  the  intended  effectiveness  level).  Revisions 
were  made  to  such  statements  to  ensure  that  they  reflected  the  proper 
effectiveness  levels. 

Discussion 

The  NOS  specific  second  tour  rating  scales  appear  ready  for  field 
testing.  Retranslation  results  indicate  that  the  category  system  for  each 
M0S*s  scales  and  the  effectiveness  levels  reflected  in  the  summary 
statements  anchoring  the  rating  categories  will  provide  a  comparatively 
unambiguous  rating  format  for  evaluating  second  tour  soldier  performance 
in  these  HOS.  The  Army-wide  scale  development  effort  is  nearing 
completion.  All  that  remains  is  preparation  of  behavioral  summary 
statements  to  anchor  the  three  general  levels  of  effectiveness  for  each 
of  the  Amy-wide  dimensions.  The  rest  of  this  discussion  focuses  on  the 
'shortcut*  method  used  here  to  develop  second  tour  NOS-specific  scales  and 
inferences  that  might  be  made  about  the  nature  of  the  second  tour  soldier 
Job  based  on  the  content  of  the  behavioral  incidents  gathered. 

Comments  on  the  "Shortcut"  Method  for  HOS-Soecific  Scale  Development 

One  lesson  learned  from  the  NOS-specific  scale  development  effort 
Is  that  a  procedure  less  time  consiming  than  the  usual  behavioral  scale 
development  sequence  may  be  very  effective  when  behavior -based  rating 
scales  for  a  similar  Job  are  already  available.  The  typical  approach  for 
constructing  such  scales  Is  to  elicit  large  numbers  of  behavioral 
exa^iles,  develop  dimensions  based  on  the  content  of  the  examples,  have 
the  examples  retranslated  into  those  dimensions  and  according  to 
effectiveness  level,  and  write  behavioral  summary  statements  for  each 


583 


performance  level  on  each  dimension.  In  addition,  the  summary  statements 
are  often  reviewed  by  job  experts  before  the  scales  are  put  In  final 
form. 

Because,  the  first  tour  behavioral  rating  scales  were  available 
for  each  of  the  nine  HOS  iQsi  because  the  second  tour  performance 
requirements  were  reported  to  be  similar  In  many  ways  to  first  tour 
requirements.  It  seemed  appropriate  In  our  research  to  simplify  the  MOS- 
speciflc  scale  development  procedures.  Accordingly,  and  as  mentioned 
previously  In  this  paper,  the  first  tour  scales  were  used  as  a  starting 
point  In  the  present  effort.  Those  parts  of  the  scales  requiring  changes 
were  revised  utilizing  a  relatively  small  number  of  performance  examples 
and  a  group  of  job  experts  working  directly  on  the  scales*  summary 
statements.  This  shortened  procedure  reduced  considerably  the  time  and 
expense  needed  for  rating  scale  development  without  reducing  the  quality 
of  the  scales,  as  was  apparent  from  the  favorable  retranslation  results 
for  the  final  summary  statements. 

On  the  Nature  of  the  Second  Tour  NCO  Job:  Technical  and 

SuBecYiiorY  Duties 

An  Important  job  content -related  Issue  addressed  by  these  rating 
scale  development  results  concerns  the  nature  of  the  second  tour  NCO  job. 
Specifically,  second  tour  soldiers  have  a  variety  of  performance 
requirements,  some  Involving  technical  aspects  of  the  job  and  others 
relating  to  supervisory  duties.  Second  tour  NCOs  both  perform  and 
supervise  the  work.  Data  gathered  In  the  present  effort  provide  some 
Information  relevant  to  determining  the  salience  of  the  technical  versus 
supervisory  elements  of  these  jobs. 


584 


In  particular,  the  content  of  the  performance  examples  or 
Incidents  gathered  for  the  nine  MOSs  should  reveal  estimates  of  the 
relative  Importance  of  the  technical  and  supervisory  aspects  of  the 
second  tour  soldier  Job.  Consideration  of  how  these  performance  incidents 
were  elicited  will  clarify  why  this  Is  so.  Recall  that  the  NCOs  and 
their  supervisors  from  each  of  the  target  MOSs  were  asked  In  a  workshop 
setting  to  record  behavioral  Incidents  they  recalled  from  observing 
second  tour  soldiers  working  In  these  MOSs.  Workshop  participants  were 
told  that  the  Incidents  could  refer  to  any  part  of  the  job  for  that  HOS, 
so  we  would  expect  the  content  of  a  large  number  of  Incidents  gathered 
Inductively  In  this  manner  should  representatively  sample  the  different 
elements  of  the  job. 

More  precisely,  we  would  expect  that  the  performance  Incidents 
elicited  this  way  would  reflect  a  representative  sample  of  the  job 
content  related  to  performance  reouirements.  what  it  takes  to  be 
effective  on  these  jobs  (rather  than,  for  example,  the  time  spent  on 
different  job  activities).  This  Is  because  the  behavioral  analysis 
method  draws  out  Incidents  whose  content  relates  to  effectiveness  on  the 
job.  As  mentioned  previously  In  this  section,  we  did  not  gather  a  large 
number  of  performance  Incidents  for  each  MOS,  but  the  incidents  we  did 
collect,  across  all  MOSs,  iQji  the  Army-wide  Incidents  appear  to  yield  a 
sufficient  sampling  to  provide  a  look  at  the  Issue  of  job  content, 
technical  versus  supervisory,  related  to  performance  requirements  on 
these  jobs. 

Table  3  shows  the  percent  supervisory  performance  Incidents  as 
judged  by  our  research  staff,  for  each  of  the  nine  MOSs,  along  with  the 


585 


total  percentage  of  N0S*spec1f1c  Incidents  that  Mere  supervisory  In 
nature,  across  all  nine  HOSs.  Referring  to  Individual  NOSs,  second  tour 
Infantrynen  and  light  wheel  vehicle  Mechanics  seem  to  do  the  Most 
supervising,  while  tank  crewman  and  vehicle  operators  are  Involved  least 
In  supervising  soldiers. 

Comparing  Table  3  with  Table  2  notice  that  our  decision  to  develop 
(or  not  to  develop)  M0S>spec1f1c  supervisory  categories  for  each  NOS  was 
not  directly  related  to  the  percentage  of  supervisory  Incidents  gathered 
for  that  NOS.  Rather,  as  mentioned  previously,  HOS-specIfic  supervisory 
categories  were  developed  only  when  the  incidents  for  that  NOS  reflected 
aspects  of  supervision  which  were  not  tapped  by  the  Army*wide  supervisory 
dimensions. 

Table  4  presents  a  more  detailed  analyses  of  the  HOS-specific 
supervisory  performance  incidents.  Also  shown  in  Table  4  Is  the 
percentage  of  the  734  Army-wide  incidents  reliably  retranslated  into  the 
supervisory  performance  dimensions  in  the  Army-wide  scale  development 
effort.  Although  the  total  percentages  of  HOS-specific  and  Army-wide 
supervisory  Incidents  are  reasonably  close  (27. IX  and  30. 5%),  the 
distribution  of  these  Incidents  to  Individual  supervisory  categories  is 
very  uneven  across  the  two  sources  of  Incidents.  The  vast  majority  of 
the  HOS-specific  supervisory  Incidents  fall  In  the  Organizing, 
Supervising,  Nonitoring,  and  Correcting  dimension,  whereas  supervisory 
Incidents  are  more  uniformly  spread  across  all  three  categories  In  the 
Army-wide  case.  The  reason  for  so  few  HOS-specific  Incidents  In  the 
"Showing  Concern”  dimension  Is  pfm^bly  due  to  the  more  generic  nature  of 


586 


that  dimension  and  the  Instructions  to  HOS-spec1f1c  workshop 
participants  to  focus  on  performance  examples  relevant  only  to  the  target 
MOS.  It  Is  not  clear  why  there  are  differences  In  the  patterns  of 
Incidents  for  the  other  two  supervisory  categories,  although  sampling 
error  Is  certainly  a  possible  reason  for  such  differences. 

At  any  rate,  results  In  Tables  3  and  4  suggest  that  Indeed  the 
second  tour  soldier  job  has  performance  requirements  In  both  the 
technical  and  supervisory  areas.  For  most  of  the  HOSs,  roughly  one- 
quarter  to  one-third  of  the  performance  demands  are  likely  to  be 
supervisory  In  nature,  with  the  rest  In  the  technical  arena.  This 
finding  has,  of  course.  Important  Implications  for  soldier  selection, 
as  well  as  for  the  training  and  retention  of  second  tour  Army  personnel. 
Selection  concerns  need  to  focus  on  personal  characteristics  relevant  to 
supervisory  success  In  addition  to  aptitudes  and  abilities  Important  for 
obtaining  technical  knowledge  and  skills  necessary  for  the  technical 
aspects  of  the  job.  Training  must  emphasize  skill -building  Instruction 
and  on-the-job  experiences  related  to  technical  ijili  supervisory  aspects 
of  the  job.  And,  retention  of  second  tour  soldiers  with  skills  and 
potential  In  both  areas  should  be  explicitly  encouraged.  Project  A 
researchers  are  attending  to  these  Implications  In  continuing  efforts  to 
Improve  the  overall  effectiveness  of  the  U.S.  Army. 


587 


References 


Boman,  W.  C.  (1979)  Format  and  training  effects  on  rating  accuracy  and 
rating  errors.  Journal  of  Applied  Psychology,  412*421. 

Borman,  U.  C.,  Notowldio,  S.  J.,  Rose,  S.  R.,  &  Hanser,  L.  M.  (1987). 
Development  of  a  model  of  soldier  effectiveness.  ARI  Technical 
Report  741.  Alexandria,  VA:  U.S.  Army  Research  Institute  for  the 
Behavioral  and  Social  Sciences. 

Campbell,  C.  H.,  Campbell,  R.  C.,  Rumsey,  M.  G.,  &  Edwards,  0.  C.  (1986) 
Development  and  field  test  of  Project  A  task-based  HOS-soecIfic 
criterion  measures.  ARI  Technical  Report  717.  Alexandria,  VA:  U. 
S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences. 

Campbell,  J.  P.,  Dunnette,  M.  D.,  Arvey,  R.,  &  Hellervick,  L.  (1973). 
The  development  and  evaluation  of  behavlorally  based  rating  scales. 
Journal  of  Applied  Psychology.  15-22. 

Davis,  R.  H.,  Davis,  6.,  &  Joyner,  J.,  and  deVera,  M.  V.  (1986). 
Development  and  field  test  of  Job  relevant  knowledge  tests  for 
selected  MOS.  ARI  Technical  Report  757  .  in  press.  Alexandria, 
VA:  U.  S.  Army  Research  Institute  for  the  Behavioral  and  Social 
Sciences. 

Peterson,  N.  G.  (Editor).  (1986).  Development  and  field  test  of  the 
trial  battery  for  Project  A.  ARI  Technical  Report  739.  Alexandria 
VA:  U.  S.  Army  Research  Institute  for  the  Behavioral  and  Social 
Sciences. 

Pulakos,  E.  D.,  &  Borman,  W.  C.  (Editors).  (1986).  Development  and 

field-test  of  theAmy-Mlde  rating, scales  and  the  rater  orientation 
and  training  program.  ARI  Technical  Report  716.  Alexandria,  VA:  U 
S.  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences. 


53C 


Toquam,  J.  L,  McHenry,  J.  J.,  Corpe,  V.  A.,  Rose,  S.  R.,  Lammlein,  S.  E.,  Kemery,  E., 


Borman,  W.  C.,  Mendel,  R.,  &  Bosshardt,  M.  (1986).  Development  and  field  test  of 
behavioral  anchored  rating  scales  for  nine  MOS.  ARI  Technical  Report  776. 
Alexandria,  VA:  U.S.  Army  Research  Institute  for  the  Behavioral  and  Social 
Sciences. 


589 


Table  1.  Summary  of  Reliably  Retranslated  Second  Tour 
Army-Wide  Incidents  by  Category 


Category _ 

#/X  of 

Incidents 

A. 

Displaying  Technical  Knowledge/Skill 

55/7% 

B. 

Displaying  Effort,  Conscientiousness, 

and  Responsibility 

168/23% 

C. 

Organizing,  Supervising,  Monitoring, 

and  Correcting  Subordinates 

99/13% 

0. 

Training  and  Developing 

63/9% 

E. 

Showing  Consideration  and  Concern 

for  Subordinates 

62/8% 

F. 

Following  Regulations/Orders  and 

Displaying  Proper  Respect  for  Authority 

59/8% 

G. 

Maintaining  Own  Equipment 

21/3% 

H. 

Displaying  Honesty  and  Integrity 

59/8% 

I. 

Maintaining  Proper  Physical  Fitness 

32/4% 

J. 

Developing  Own  Job/Soldiering  Skills 

43/6% 

K. 

Maintaining  Proper  Military  Appearance 

27/4% 

L. 

Controlling  Own  Behavior  Related  to 

Personal  Finances,  Orugs/Alcohol,  and 

Aggressive  Acts 

46/6% 

590 


Table  2.  Supervisory  Performance  Categories  for 
Second  Tour  MOS-Specific  Scales 


IIB 

Supervising  Soldiers  In  the  Field 

Leading  the  Team 

13B 

None 

19E 

Assuming  Supervisory  Responsibilities  in 

Absence  of  Tank  Commander 

31C 

Managing  the  RATT  Rig 

63B 

Checking  Repairs  Hade  by  Other  Mechanics 

64C 

None 

71L 

None 

* 

91A/B 

None 

95B 

Leading  the  Team  in  a  Tactical  Environment 

591 


Table  3.  Percent  Supervisory  Performance  Incidents 
From  NOS-Specific  Workshops 


m. 

Total  Number 

of  Incidents 

Number  of 

Supervisory 

Incidents 

Percent 

Supervisory 

MOS-Specific 

Incidents 

IIB 

1 59 

71 

44.7% 

13B 

57 

13 

22.8% 

19E 

236 

27 

11.4% 

31C 

212 

49 

23.1% 

63B 

180 

76 

42.2% 

64C 

184 

31 

16.8% 

71L 

156 

36 

23.1% 

91A 

89 

33 

37.1% 

95B 

234 

73 

31.2% 

Totals 

1507 

409 

27.1% 

592 


Table  4.  Numbers  and  Percent  Performance  Incidents 
By  Supervisory  Category 


MOS-Specific 

Army-Wide 

_ Incidents 

Incidents 

-JL 

_2L 

_JL 

-JL 

1.  Organizing,  Supervising,  Monitoring, 

and  Correcting  Subordinates 

310 

20.6% 

99 

13.5% 

2.  Training  and  Developing 

Subordinates 

82 

5.4% 

63 

8.6% 

3.  Showing  Consideration  and 

Concern  for  Subordinates 

17 

1.1% 

62 

8.4% 

Totals 

409 

27.1% 

224 

30.5% 

593 


GETTING  ANSWERS  TO  THE  RIGHT  QUESTIONS: 
JOB  ANALYSIS  STRATEGY 


Michael  G.  Rumsey 
U.S.  Army  Research  Institute 


Presented  on  Symposium, 

"Junior  Noncommissioned  Officer  Job  Requirements; 
Where  Does  Leadership  Fit  In?" 

At  the  Annual  Convention  of  the 
American  Psychological  Association 
New  York 

August  1987 


The  views  expressed  in  this  paper  are  those  of  the  authors  and  do  not 
necessarily  reflect  the  official  opinions  and  policies  of  the  U.S.  Army 
Research  Institute  or  the  Department  of  the  Army. 


595 


Getting  Answers  to  the  Right  Questions: 

Job  Analysis  strategy 

To  many,  the  words  "job  analysis"  fail  to  generate  a  sudden  surge  of 
excitement.  Instead,  they  may  evoke  images  of  mindless  autonatons  poring  over 
endless  lists  of  task  statements.  The  practitioners  in  this  field  should  be 
forgiven  if  they  sometimes  find  themselves  identifying  with  Rodney 
Danger field. 

Such  an  image  of  job  analysis  in  fact  poorly  represents  the  nature  of  the 
activity.  It  encompasses  issues  which  are  diallenging,  stimulating  and 
critically  Important.  Consider  this  situation.  You  are  building  an  enlisted 
selection  and  classification  system  for  the  entire  U.S.  Army.  You  teint  to 
test  the  validity  of  this  system  in  as  rigorous  a  manner  as  possible,  so  you 
have  set  about  to  build  a  comprehensive  set  of  criterion  measures  to  capture 
soldier  performance  in  both  the  first  and  second  tours  of  duty.  Your  mission 
is  partially  ccnplete;  you  have  finished  development  of  first  tour  measures. 

Now,  as  you  approach  development  of  second  tour  measures,  you  realize 
answers  to  several  key  questions  are  needed  before  you  can  proceed.  First, 
\Aiat  ^ould  be  the  content  of  these  measures?  Second,  are  separate  measures 
needed  for  eadi  job?  Or  are  the  jobs  so  similar  that  the  same  measures  can  be 
applied  to  all?  Third,  to  «ihat  extent  can  first  tour  measures  be  used  in 
second  tour?  You  do  not  want  to  squander  valuable  resources  to  develop  new 
second  tour  measures  if  there  is  really  no  major  difference  between  first  and 
second  tour  performance.  Fourth,  what  kinds  of  measurmnent  methods  are 
needed?  These  need  to  be  suitable  to  the  job  requironents. 


596 


Cleatly,  you  now  need  job  analysis  information,  nie  challenge  here  is  to 
develop  that  job  analysis  strategy  %^ich  will  not  only  identify  and  prioritize 
jc^  canponentsr  but  which  will  furthermore  provide  sufficient  information  to 
ensure  a  maximally  effective  set  of  performance  measures.  Such  a  strategy 
should  yield  as  coi^rehensive  a  job  picture  as  possible.  Multiple  methods  are 
to  be  preferred  as  likely  yielding  more  conplete  information  than  might  a 
single  method.  To  the  degree  feasible,  all  relevant  and  useful  sources  of 
information  should  be  consulted. 

As  you  have  probably  guessed,  the  scenario  I  have  been  describing  is  not 
merely  a  hypothetical  one.  It  is  essentially  the  situation  we  found  ourselves 
in  as  we  prepared  to  analyze  nine  second  tour  jobs  in  Project  A,  a  large  scale 
project  to  develop  performance-based  selection  and  classification  measures  for 
the  Army.  Before  the  other  menbers  of  this  panel  tell  you  what  we  have  been 
learning  from  these  analyses,  I  would  like  to  spend  the  next  few  minutes 
describing  the  overall  strategy  that  guided  our  efforts. 

At  the  outset  of  this  project,  vie  had  advanced  a  general  strategy  for  job 
analysis  designed  to  provide  both  good  overall  job  coverage  and  a  basis  for 
discriminating  between  good  and  poor  performers.  A  multimethodological 
approach  was  adopted  which  incorporated  two  of  the  three  basic  types  of  job 
analysis  methods  identified  by  Ash  (1982):  task-based  and  behavior-based. 

The  task-based  approach  involved  heavy  reliance  on  existing  job  information, 
svpplencntad  by  interviews  with  cognizant  siibject  matter  experts,  to  first 
identify  a  consolidated  domain  of  all  tasks  viithin  a  job.  Prom  this  domain,  a 
■nailer  set  of  tasks  was  to  be  identified  whidi  could  best  represent  the  full 


597 


domain  for  testing  purposes.  Finally,  the  tasks  in  the  smaller  set  vsre 
divided  into  discrete  steps  (Campbell,  Can^ijell,  Rmsey,  t  Edvards,  1985). 

*nie  behavior-based  approach  involved  vforkshops  in  which  subject  matter 
experts  on  the  job  generated  examples  of  good,  poor  and  average  performance. 
These  exanples  were  then  clustered  into  dimensions.  (Toquam,  McHenry,  Corpe, 
Rose,  Lamnlein,  Kenery,  Borman,  Mendel,  &  Bosshardt,  in  preparation). 

This  general  approach)  was,  in  our  judgment,  reasonably  successful  for  the 
analysis  of  first  tour  jobs.  It  led  to  the  measures  which  were  judged  by 
responsible  Army  proponents  were  to  provide  adequate  job  coverage  and  which 
provided  reasonable  discrimination  among  those  tested.  But  the  job  require¬ 
ments  at  the  first  tour  level  »eze  relatively  uncomplicated.  A  soldier  vas 
essentially  expected  to  be  able  and  willing  to  do  the  vx?rk  required.  Among 
the  second  tour  soldiers  \de  vrould  be  examining,  many  vrould  have  advanced  to  a 
junior  non-conmissioned  officer  level.  The  available  literature  (H^iein, 
Kaplan,  Miller,  Olnstead  &  Sharon,  1984;  Wallis,  Korotkin,  Yarkin-Levin, 
Schemmer,  &  Mumford,  1985),  as  well  as  preliminary  soldier  interviews, 
indicated  that  at  this  level  soldiers  vrould  have  supervisory  as  well  as 
technical  job  requirements.  Would  the  first  tour  job  analysis  approach  still 
suffice  for  soldiers  required  to  assume  responsibility  for  the  work  and 
behavior  of  others? 

It  is  as  true  in  job  analysis  as  elsewhere  that  the  answers  one  gets  is 
to  no  small  degree  a  function  of  the  questions  one  asks.  In  our  b^vior- 
based  approach  we  had  been  asking  essentially  two  kinds  of  questions:  vAiat 
are  critical  behaviors  for  effective  perfocnance  on  a  specific  jc^  and  v<hat 


598 


are  critical  behaviors  for  effective  perfosnance  on  any  type  of  Asny  job? 
These  questions  seened  sufficiently  encon^assing  to  capture  both  supervisory 
emd  non-si^rvisory  job  requirements. 

Our  real  concern  was  what  we  vK)uld  find,  or  fail  to  find,  using  the  task- 
based  approach.  Let  us  *for  the  moment  split  second  tour  requironents  into  two 
categories — technical  and  supervisory;  recognizing  that  such  a  dichotomy 
represents  a  gross  oversimplification.  In  other  contexts,  we  will  be  using 
the  %ford  "technical"  in  a  much  more  restricted  way. 

A  task  is,  by  one  definition,  an  observable,  measurable  action,  with  a 
definite  beginning  and  end,  which  is  performed  for  a  relatively  short  period 
of  time  (Devries,  Eschenbrenner  &  Ruck,  1980,  pp.  10,  13).  This  definition 
fits  technical  tasks  reasonably  well;  in  fact,  the  task-based  approach  seems 
principally  designed  to  generate  tasks  whidi  are  technical  in  nature.  It  was 
our  expectation  that  this  approach  would  provide  satisfactory  coverage  of  the 
technical  donnain. 

We  had  no  such  expectation  with  respect  to  the  supervisory  domain. 
Supervisory  behaviors  tend  to  be  continuous  rather  than  discrete,  are  not 
easily  observable  and  measurable,  and  are  difficult  to  fix  in  time.  Since  it 
is  difficult  to  translate  leader  behaviors  into  tasks,  those  generating  task 
inventories  may  omit  such  behaviors  entirely  or  represent  them  inadequately. 

We  felt  the  task-based  approach  provided  useful  infozmatim  and  should  be 
included  in  our  overall  strategy.  The  dilenma  was  how  to  insure  that  the  task 
lists  generated  provided  adequate  representation  of  supervisory  job  require¬ 
ments. 


599 


Our  basic  strategy  was  simply  to  expand  the  sources  we  used  to  generate 
our  consolidated  task  list.  Fortunately/  sources  were  available  tihidi/  when 
conbined/  gave  us  reasonable  confidence  tiiat  we  were  covering  the  supervisory 
domain.  Alma  Steinberg,  a  contributor  to  the  next  paper,  and  her  colleagues 
(Steinberg,  van  Rijn  &  Hunter,  1986)  had,  through  extensive  interviews, 
generated  a  compr^ensive  task  list  foctised  on  leader  requiranents  from  the 
junior  NOO  to  the  senior  officer  level.  Ilene  Cast,  our  next  Quaker,  had 
generated  a  list  of  leader  tasks  based  on  critical  incidents  which  was  less 
exhaustive  than  the  list  generated  by  Steinberg  but  which  tended  to  be  more 
focused  at  the  junior  NCO  level. 

Following  a  preliminary  data  collection  effort  which  provided  more 
information  about  the  relevance  of  tasks  on  both  lists  for  second  tour 
soldiers,  the  two  lists  were  merged  into  one  through  a  process  designed  to 
retain  the  most  desirable  characteristics  of  eadi. 

At  this  point,  we  had  a  strategy  %iihich  we  believed  could  provide  the 
information  we  sought  about  measuronent  content  and  method,  the  extent  to 
which  measures  could  be  collapsed  across  jobs,  and  the  extent  to  which  first 
tour  measures  were  appropriate  for  second  tour  soldiers.  The  follomng  papers 
%iill  explore  how  this  strategy  %«bs  applied  and  what  answers  %«ere  generated  by 
it. 

References 

Ash,  R.  A.  (1962) .  Jcb  elements  for  task  clusters:  Argixnents  for  using 
multi-methodological  approaches  and  a  denonstration  of  their  utility. 
Public  Personnel  Management  Journal,  11,  80-90. 


600 


Campbell,  C.  H.,  Can^ibell,  R.  C.,  Rinsey,  M.  G.,  6  BSwards,  D.  C.  (1985). 

Development  and  field  test  of  task-based  MOS-specific  criterion  measures. 
(Tech.  Rep.  No.  717) .  Alexandria,  VA:  U.8.  Axmy  Researdi  Institute. 

Devries,  P.  B.,  Eschenbrenner ,  A.  J.,  &  Ruck,  H.  W.  (1980).  Task  analysis 
handbook  (Tech.  Rep.  No.  79-45).  Brooks  Air  Force  Base,  TX:  Air  Force 
Hisnan  Resources  Laboratory. 

Fine,  S.  A.  (1974).  Functional  job  analysis:  An  approach  to  a  technology  for 
marpower  planning.  Personnel  Joinrnal,  S3,  813-818. 

Hebein,  J.,  Kaplan,  A.,  Miller,  R.,  Olmstead,  J.,  i  Sharon,  B.  (1984).  NCO 
leadership;  Tasks,  skills,  and  functions.  (Research  Note  No,  84-95). 
Alexandria,  VA:  Hunan  Resources  Research  Organization. 

Steinberg,  A.  G.,  van  Rijn,  P.,  &  Hunter,  F.  T.  (1986).  Leader  requirements 
task  analysis.  Paper  presented  at  the  28th  Annual  Military  Testing 
Association  Conference,  Mystic,  Connecticut. 

Toquaro,  J.  L.,  McHenry,  J.  J.,  Corpe,  V.  A.  Rose,  S.  R.,  Lammlein,  S.  E., 
Kemery,  E.,  Borman,  W.  C.,  Mendel,  R.,  &  Bosshardt,  M.  J.  (in  prepara¬ 
tion)  .  Development  and  field  test  of  behaviorally  anchored  rating  scales 
for  nine  MOS. 

Wallis,  M.  R.,  Korotkin,  A.  L.,  Yarkin-Levin,  K.,  Schemmer,  F.  M.,  fc  Numford, 
M.  D.  (1986) .  Leadership  job  dimensions  and  competency  requirements  for 
commissioned  and  nonconmissioned  officers;  Rgnediation  of  inadequacies 
in  existing  data  bases.  (Research  Note  No.  86-20)  Alexandria,  VA:  U.S. 
Anv  Researdi  institute. 


930203 


601 


